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Abstract 



The automation of meta-theoretic aspects of formal systems typically requires the 
treatment of syntactically complex objects. Thus, programs must be represented 
and manipulated by program development systems, mathematical expressions by 
computer-based algebraic systems, and logic formulas and proofs by automatic proof 
systems and proof assistants. The notion of bound variables plays an important role 
in the structures of such syntactic objects, and should therefore be reflected in their 
representations and properly accounted for in their manipulation. The A-calculus 
was designed specifically to treat binding in a logically precise way and the terms of 
such a calculus turn out to be an especially suitable representational device for the 
application tasks of interest. Moreover, the equality relation associated with these 
terms and the accompanying notion of higher-order unification leads to a convenient 
means for analyzing and decomposing these representations in a way that respects 
the binding structure inherent in the formal objects. 

This thesis concerns the language XProlog that has been designed to provide 
support for the kinds of meta-programming tasks discussed above. In its essence, 
XProlog is a logic programming language that builds on a conventional language like 
Prolog by using typed A-terms instead of first-order terms as data structures, by using 
higher-order unification rather than first-order unification to manipulate these data 
structures and by including new devices for restricting the scopes of names and of 
code and thereby providing the basis for reahzing recursion over binding constructs. 
These features make XProlog a convenient programming vehicle in the domain of 
interest. However, they also raise significant implementation questions that must be 
addressed adcqiiatcly if the language is to be an effective tool in these contexts. It is 
this task that is undertaken in this thesis. 

An efficient implementation of XProlog can potentially exploit the processing 

structure that has been previously designed for realizing Prolog. In this context, 

the main new issue to be treated becomes that of higher order unification. This 

computation has characteristics that make it difficult to embed it effectively within a 

low-level implementation: higher-order unification is in general undecidable, it does 

not admit a notion of most general unifiers and a branching search is involved in the 

task of looking for unifiers. However, a sub-class of this computation that is referred 

to as Lx or higher-order pattern unification has been discovered that is substantially 

better behaved: in particular, for this class, unification is decidable, most general 

unifiers exist and a deterministic unification procedure can be provided. This class 

is also interesting from a programming point-of-view: most natural computations 
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carried out using XProlog fall within it. Finally, a treatment of full higher-order uni- 
fication within the context of XProlog can be realized by solving only higher-order 
pattern unification problems at intermediate stages, delaying any branching and pos- 
sibly costly search to the end of the computation. 

This thesis examines the use of the strategy described above in providing an 
implementation of XProlog. In particular, it develops a new virtual machine and 
compilation based scheme for the language by embedding a higher-order pattern uni- 
fication algorithm due to Nadathur and Linnell within the well-known Warren Ab- 
stract Machine model for Prolog. In executing this idea, it exposes and treats various 
auxiliary issues such as the low-level representation of A-terms, the implementation 
of reduction on such terms, the optimized processing of types in computation and 
the representation of unification problems whose solution must be deferred till a later 
point in computation. Another important component of this thesis is the develop- 
ment of an actual implementation of XProlog — called Teyjus Version 2 — that is based 
on the conceptual design that is presented. This system contains an emulator for the 
virtual machine that is written in the C language for efficiency and a compiler that is 
written in the OCaml language so as to enhance readability and extensibility. This 
mix of languages within one system raises interesting software issues that are han- 
dled. Portability across architectures for the emulator is also treated by developing 
a modular mapping from term representation to actual machine structures. A final 
contribution of the thesis is an assessment of the efficacy of the various design ideas 
through experiments carried out with the assistance of the system. 
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Chapter 1 



Introduction 

This thesis is concerned with the implementation of a higher order logic programming 
language called XProlog. This language is of interest because it provides perspicuous 
and effective ways for realizing computations over formal objects such as programs, 
mathematical expressions, logical formulas, and proofs. Computations of this kind are 
frequently needed in mcta-Icvel application tasks such as those involved in building 
program development systems [27], automated algebraic systems [9, 11], automatic 
reasoning systems [12, 25], and proof assistants [3, 5, 17, 53]. In this chapter we 
motivate the XProlog language from the perspective of such applications, explain 
what is involved in implementing it well and then characterize the contributions of 
this thesis. 

1.1 Using A-terms as Data Structures 

An important first step in building systems that manipulate formal objects is the 
design of a convenient representation for such objects. When we examine the specific 
programming tasks, it turns out that in many of them there is a need to deal with 
syntactic constructs that involve a notion of binding. As an example, consider a the- 
orem proving system that manipulates quantificational formulas. When representing 
a formula such as Va;P(a;), where P{x) denotes an arbitrary formula in which x may 
appear free, it is necessary to capture the scoping aspect of the quantifier as well as 
the fact that the particular choice of name for the quantified variable is not signifi- 
cant. These properties will be necessary, for example, in correctly instantiating the 
quantifier when needed — we have to be careful not to substitute terms for x which 
contain variables that get captured by quantifiers appearing further inside P{x) — and 
in recognizing that the formula VxP(x) is really the same as \/yP{y). Similar obser- 
vations can be made with respect to the representation of programs in a program 
manipulation system. Here, it is necessary to encode functions in such a way that 
the binding aspects of arguments and issues of scope are clearly recognized in the 
course of analyzing and transforming their structures. Some of these aspects can be 
illustrated by considering the simple setting of the A-calculus that underlies the idea 
of functions in programming languages. Suppose, for example, that we want to write 
an evaluator for the A-calculus. In this setting, we have to be able to transform an 
expression of the form {{Xx M) N) into one that is obtained by replacing the free 
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occurrences of a; in M by A^. In carrying out this operation, we have to be able to 
distinguish free occurrences of x from the bound ones and we also have to be careful 
to not allow any free variable in N to be captured by an abstraction within M. More- 
over, a prerequisite for applying such a transformation is that we have to be able to 
recognize that a term has the form ((AxM) N) even if the abstracted variable in the 
"function part" is not exactly named x. 

A careful examination of the examples discussed above shows that even though the 
application domains are quite different, there is a common part to what needs to be 
treated with regard to binding in both cases. The important aspects of binding can in 
fact be uniformly captured by using the terms of the A-calculus as a representational 
mechanism. For example, the concept of the scope of a binding is explicitly reflected 
in the structure of a A-abstraction. Similarly, the recognition of the irrelevance of 
names for bound variables and the preservation of their scopes during substitution 
are manifest though the usual A-conversion rules. Thus, the representation of formal 
objects relevant to different contexts can be accomplished by using A-abstractions 
to capture the underlying binding structures and using constructors like in first- 
order abstract syntax representations to encode whatever context specific semantics 
is relevant to the analysis. 

As an illustration of this idea, consider the formula \/xP{x) mentioned in the theo- 
rem proving example. This formula can be represented by the A-term [all {Xx P{x))), 
where all is a constructor chosen to denote the universal quantifier, and P{x) rep- 
resents, recursively, the formula P{x). This representation separates out the two 
different roles of a universal quantifier, one of which corresponds to imposing the "for 
all" semantics and the other that indicates the scope of the quantification, and it 
captures the latter explicitly through a A-abstraction. Using this representation, the 
instantiation of the universal quantifier of the given formula can be simply denoted 
as an A-apphcation of form {{Xx P{x)) t), where t is the representation of the object- 
level term that the quantifier is to be instantiated with. This "application term" 
is equivalent under the rules of A-conversion to a term that results from replacing 
each occurrence of x in P{x) with t being careful, of course, to avoid any inadvertent 
capture of free variables in t. Similarly, the object-level A-term {{Xx M) N) can be 
represented by the expression {app {abs {Xx M)) N); notice that app and abs are 
constructors chosen to encode object language application and abstraction in this 
representation, and the binding effect of an object-level abstraction is captured by an 
abstraction of the meta-language. With this kind of representation, we can describe 
the evaluation rule that was of interest earlier as simply that of rewriting an expres- 
sion of the form {app {abs T) R) to the form {T R); the meta-language understanding 
of A-terms ensures then that the required substitution operation will be carried out 
in a logically correct manner. 
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Our interest in this thesis is in a language for carrying out computations over 
formal objects. From this perspective, what we desire is a language that allows us 
to use A-terms as a means for representing objects and that provides primitives for 
manipulating these in a logically meaningful way. The logic programming language 
XProlog [43] is one of this sort. It is based on a higher-order logic built around a typed 
version of the A-calculus. The presence of A-terms as basic data structures in this lan- 
guage provides the convenience discussed earlier in this section in representing formal 
objects, and therefore renders the language an especially suitable tool to describe for- 
mal systems. This language attributes operational semantics to logical connectives 
and quantifiers, so that these logical symbols can also be viewed as programming 
primitives. As a result, the language allows for the construction of descriptions of 
formal systems that can be viewed as specifications but that are also executable as 
programs. In comparison with usual logic programming languages, XProlog provides 
two new logical devices for specifying the scopes of names and of clauses defining 
predicates. Prom the programming perspective, these devices turn out to be helpful 
in describing recursive computations over binding structure. Many uses have been 
made of these various features of XProlog in describing interesting computations over 
formal objects; see, for example, [2, 16, 22, 43, 52, 54]. These kinds of applications 
motivate the development of an efficient implementation of this language, a topic that 
is the focus of this thesis. 



1.2 Using Higher-Order Unification for Computation 

An important part of the computational machinery underlying XProlog is a realiza- 
tion of unification over A-terms. This form of unification, known as higher-order 
unification, differs from the one used in a language like Prolog in that equality be- 
tween terms is based not just on identity but also on the conversion rules of the 
A-calculus. Pragmatically, this operation is the basis for analyzing the shapes of syn- 
tactic structures that involve binding: for example, it is this form of unification that 
allows \/x{P{x) AQ{x)) to be used as a template for matching with formulas that have 
a particular form and for decomposing them into the parts corresponding to the con- 
juncts embedded inside the universal quantification if they do have this form. Most 
existing implementations of XProlog realize higher-order unification based on a proce- 
dure described by Huet. While higher-order unification seems a necessary operation 
within XProlog, it unfortunately also turns out to be one that has poor theoretical 
properties. Por example, it does not admit most general unifiers, a possibly redun- 
dant search may be involved in calculating unifiers and unifiability is, in the limit, 
undecidable. These kinds of properties manifest themselves in Huet's procedure by 
giving it a non-deterministic branching structure, by restricting it to calculating only 
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pre-unifiers so as to avoid redundancy and by making it a possibly non-terminating 
computation. Embedding such a procedure within a larger language implementation 
is difficult and can also make it difficult to reahze other associated operations in an 
efficient manner. 

While the situation with employing higher-order unification in a practical way 
seems difficult at first sight, the signs from looking at actual attempts to employ it 
is much more hopeful. In particular, from using a system realizing XProlog based on 
Huet's procedure [45], and also from using other logical frameworks and proof assis- 
tants such as Twelf [58] and Isabelle [51] that employ higher-order unification, it be- 
comes evident that there is a large collection of practically relevant meta-programming 
tasks in which the relevant higher-order unification problems actually have unique 
solutions that can be completely revealed even by using Huet's pre-unification pro- 
cedure. Based on a study of the usage of higher-order unification in these examples. 
Dale Miller has identified a subset of the general problems, known as the Lx or the 
higher-order pattern class [36, 50], which covers the major cases of the unification 
problems occurring in practice [33]. Unifiability on this subset is known to be decid- 
able and it is also known that a single most general unifier can be provided in any 
of the cases where a unifier exists. In fact, Miller has described a (non-deterministic) 
algorithm for solving higher-order pattern unification problems that has the char- 
acteristic of either determining non-unifiability or producing a most general unifier 
at the end. The idea underlying this procedure have been extended to dependently 
typed A-calculi [55, 56] and higher-order rewrite systems [49]. 

It turns out that Huet's procedure is also effective when applied to higher-order 
pattern unification problems in that it is guaranteed to terminate and will do so with 
a unique successful branch. One may wonder therefore if there is any purpose to 
describing specialized unification procedures for this subclass and it is important to 
address this question to put the work in this thesis in perspective. There are, in fact, 
particular pragmatically significant ways in which the behavior of Huet's procedure 
can be improved by taking the restriction seriously. First, even though the (pre)- 
solution found is unique, Huet's procedure conducts a branching search to find it; 
it must do this since it needs to also address more general higher-order unification 
problems. It turns out that if one is not concerned about covering the larger class then 
the intermediate steps can also be made deterministic. Second, even when restricted 
to the higher-order pattern fragment, Huet's procedure is guaranteed only to find 
pre-unifiers; in some instances, it will return with a substitution and a remaining 
solvable problem but one that it chooses not to solve. When focusing only on the 
higher-order pattern unification class, however, it is possible to provide a different 
unification algorithm that will solve the problem entirely. Finally, the structures of 
solutions to general higher-order unification problems depend on the types of the 
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the terms being unified, and consequently Huet's procedure examines these types 
during computation. However, for the higher-order pattern fragment, it is possible 
to structure computation so that it does not depend on type information. This has a 
practical significance since it is, in general, an expensive proposition to compute and 
carry around type annotations with terms during execution. 

Several implementations have been described of AProlog prior to this thesis and 
one of them, Teyjus Version 1 [45], even considers a compilation-based realization that 
is borrowed from heavily in this thesis. All these implementations embed within them 
Huet's treatment of higher-order unification. The distinguishing feature of the work 
described here is that it analyses the implementation of AProlog based on a model that 
treats only higher-order pattern unification. The observations in this section indicate 
a merit to considering this question: the higher-order pattern fragment is practically 
relevant and restricting to only this class can have an impact on the computational 
model that is important to understand. 

1.3 Contributions of the Thesis 

This thesis explores the idea of orienting an implementation of XProlog around a 
particular higher-order pattern unification algorithm — the one proposed by Nadathur 
and Linnell [42]. More specifically, it considers the full XProlog language, i.e., it 
does not restrict the syntax of this language in any way. However, when unification 
problems are encountered, they are solved completely only if they fall within the Lx 
fragment; more general problems are deferred and later solved only if instantiations 
convert them into ones in this subset. 

The implementation that is developed is based on using a special abstract ma- 
chine for XProlog and on compiling programs in the language into instructions for this 
machine. The basic framework for the machine is provided by the WAM, the abstract 
machine that D. H. D. Warren designed for Prolog [63]. The main new challenge in 
this work is to embed pattern unification into the WAM that was originally designed 
to treat only first-order terms. ^ There are several issues that must be considered in 
realizing such an embedding in a practically acceptable fashion. One class of such is- 
sues arises from the fact that a richer class of terms — the terms of a A-calculus instead 
of just first-order terms — have to be represented and manipulated. The machine rep- 
resentation that is chosen for such terms should, at the outset, facilitate an efficient 
equality examination between terms based on A-conversions; in particular, it should 



In comparison with Prolog, XProlog has additional search primitives and also permits a quan- 
tification over predicates. However, we add nothing new to the treatment of these aspects, simply 
inheriting them from Teyjus Version 1 that is discussed later. 
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support well the recognition of equality between terms that differ only in the names 
of bound variables and should also provide an efficient realization of /3-reduction or 
function evaluation. Beyond this, it is important to treat efficiently the typical de- 
composition of terms that is needed in the course of pattern unification. For example, 
it is often necessary to get quickly to the head of a term and this is best realized in 
a scheme that represents nested applications in a form that collects the successive 
arguments into a vector form and directly exposes the embedded head. A similar 
argument can be made for collecting a sequence of abstractions into a single abstrac- 
tion over several variables. A second issue that needs to be treated is the seamless 
integration of the richer higher-order pattern unification into the compiled treatment 
of first-order unification that is the hallmark of the WAM. In this context, we note 
that the treatment must also contain within it a suitable mechanism for delaying 
higher-order unification problems that do not fall within the higher-order pattern 
class. A final issue that we mention here is that of treating the polymorphic typing 
regime that is part of the XProlog language. A consequence of this polymorphism 
is that the particular type instances must be known when comparing two constants 
that otherwise have the same name; the ultimate identity of these constants must, 
in this case, be based on an equality of their types. Although the pattern unification 
does not need types in deciding the structures of unifiers, the role of types mentioned 
above makes it necessary to sometimes examine these dynamically to decide unifiabil- 
ity of terms. An efficient runtime type processing scheme should then be provided, in 
which types are maintained and examined only for the identity checking of constants. 

This thesis addresses these various issues and proposes solutions to them. Towards 
providing an efficient realization of reduction over A-terms, it exploits the idea of an 
explicit substitution notation for such terms [1, 48]. It further considers particular 
reduction procedures that can be used with such a representation towards getting the 
best time and space performance. To treat equivalence under bound variable renam- 
ing, it uses a nameless representation for such variables in the style of de Bruijn [8]. 
The ability to treat substitutions explicitly is exploited in distributing this operation 
over the steps that need to be performed in realizing unification towards minimizing 
redundant computations. The low-level representation of terms in the explicit sub- 
stitution form pays attention to how applications and abstractions are encoded so as 
to obtain fast access to the components that need to be examined often in the course 
of unification. The instruction set of the WAM is enhanced towards integrating the 
treatment of higher-order pattern unification into the standard compilation model. 
The particular approach that is used here is to develop these instructions so that 
first-order unification is still treated via compilation whereas the new components 
in higher-order pattern unification lead to the invocation of an interpretive phase. 
When parts of the unification problem falls outside the higher-order pattern class. 
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these are carried into subsequent computations in the form of constraints that may 
be addressed later. A practical representation is proposed for such residual problems 
and the addition to these problems as well as their re-examination is integrated into 
the instructions for the abstract machine. Finally, the issue of runtime type process- 
ing is treated by first developing a static analysis process that reduces the footprint 
of such types considerably and then including instructions in the abstract machine 
to treat the remaining aspects as much as possible through compiled code. 

In addition to proposing an implementation scheme for the XProlog language, 
this thesis also develops an actual implementation of the conceptual design that it 
produces. A characteristic of this part of the work is a careful attention to the issue 
of portability across varied architectures and operating systems. Towards this end, 
a modular method is developed for mapping the abstract machine onto the low-level 
hardware on which it is emulated. Another aspect to which close consideration has 
been given is that of enhancing the flexibility and expandability of the implementa- 
tion. To realize this goal, an attempt is made to use as much as possible a high-level 
language — here the language OCaml — in the implementation, employing the language 
C only in realizing those parts whose efficiency depends critically on the closeness to 
the underlying hardware. This mix of implementation languages raises interesting 
problems of its own that we discuss later in the thesis. We note finally that having 
an implemented system gives us the ability to test the efficacy of our various design 
ideas, a topic that we also consider on in this thesis. 

In summary, the contributions of this thesis are threefold: 

1. The design of an abstract machine and associated compilation methods for 
treating the XProlog language. A key characteristic of the abstract machine 
that is developed is that it attempts to exploit the efficiencies that arise out of 
focusing on higher-order pattern unification rather than treating more general 
forms of unification for A-terms. 

2. An actual implementation of XProlog — Version 2 of the Teyjus system — based 
on the virtual machine and compilation scheme developed. This implementation 
has proven to be extremely portable and also combines components written in 
the C and the OCaml languages towards enhancing openness and expandability 
in its structure. 

3. A study of the performance impact of using higher-order pattern unification, 
optimized runtime types processing and other related design ideas. This study is 
based on experiments conducted with Teyjus Version 2 using practical XProlog 
programs that exploit the meta-programming capabilities of the language. 

Prior to the work of this thesis, another abstract machine that is organized around 
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Huet's unification procedure has been designed for XProlog [29, 40, 41]. This abstract 
machine has in fact provided the basis for Version 1 of the Teyjus system that we 
have mentioned earher. Many challenges faced in realizing the new search primitives 
and higher-order features present in XProlog were considered for the first time in the 
context of that work and the design presented in this thesis has been infiuenced by the 
ideas developed there. However, the work undertaken in this thesis differs significantly 
from the previous design and implementation in that it takes seriously the idea of 
realizing a higher-order logic programming language through the narrower mechanism 
of treating higher-order pattern unification. In particular, it examines carefully the 
impact of this decision on various aspects of the structure of the abstract machine 
and of the efficiency of implementation. An auxiliary aspect of this work is that it 
has resulted in a system that is far more portable and expandable because of the 
particular approaches that have been used in its implementation. 

A central idea underlying this thesis is that of approaching higher-order unifi- 
cation through higher-order pattern unification. It is important to stress that the 
use of this idea is by itself not novel to our work: in particular, this idea has been 
employed previously in the proof assistant Isabelle [51] and in the logical framework 
Twelf [58]. The particular deployment of this idea in the Isabelle system is, in our 
understanding, quite different from the method we use in this thesis: Isabelle first 
tries to solve unification problems by means of a higher-order pattern unification pro- 
cedure and, if this does not succeed, it then falls back to full higher-order unification. 
By contrast, the method we use is quite similar to that employed within Twelf. in 
both higher-order pattern unification procedure is all that is used and prob- 

lems that do not fall within the class that this procedure is capable of handling are 
deferred till a later point in the computation. The distinguishing characteristic of 
our work in this context is that it explores the impact of this idea on the design of 
an abstract machine and compilation model for the underlying logic programming 
language. Another aspect of our work is that it attempts to quantify the benefits of 
using this approach through a head-to-head comparison with an implementation that 
uses Huet's unification procedure directly in implementation. 

1.4 Organization of the Thesis 

The rest of the thesis is organized as the follows. Chapter 2 provides an overview of 
the XProlog language. The discussion here illustrates the usefulness of the higher- 
order features of the language in describing formal systems and provides an intuitive 
understanding on the underlying computation model. Chapter 3 describes the notion 
of equality of A-terms that is based on A-conversion. This chapter also introduces 
an explicit substitution based representation of such terms, which facilitates efficient 
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term comparison based on the relevant notion of equality. An abstract interpreter for 
the XProlog language is presented in Chapter 4 for the purpose of formally defining 
the model of computation underlying this language. The role unification plays in 
this computational model is discussed and a practical higher-order pattern unifica- 
tion algorithm is introduced. The low-level term representation scheme used in the 
implementation developed by this thesis is discussed in Chapter 5. This discussion 
includes the presentation of an algorithm that efficiently realizes /3-reduction based 
on the explicit substitution representation discussed in Chapter 3. Also presented in 
the chapter is a refinement to the term representation geared towards providing fast 
access to the subcomponents that are needed by the term decomposition operations 
used within pattern unification. Chapter 6 describes a compilation based implemen- 
tation of the XProlog language. A detailed discussion is included of the way in which 
pattern unification can be integrated into the WAM-based computation model. In 
Chapter 7, an efficient runtime type processing scheme is proposed together with ac- 
companying static optimization processes and their integration into the compilation 
based processing model. The actual software system, Teyjus Version 2, that realizes 
the conceptual design ideas of this thesis is the focus of Chapter 8. This chapter 
also discusses the practical issues faced in the implementation of this software sys- 
tem, such as the realization of the properties of portability and openness of code. 
An assessment of the design and of the performance of Teyjus Version 2 is the topic 
of Chapter 9. Experimental data is presented and analyzed here towards providing 
a quantitative understanding of the impact of our conceptual design ideas. Finally, 
Chapter 10 concludes the thesis with a discussion of some future directions. 



Chapter 2 



The AProlog Language 

In this chapter we provide an overview of the higher-order logic programming lan- 
guage XProlog whose implementation will be the subject of the rest of the thesis. The 
foundation for this language is provided by a subclass of formulas in an intuitionistic 
version of Church's higher-order logic [10]. This class of formulas, known as higher- 
order hereditary Harrop formulas, enhances the collection of first-order Horn clauses 
that underlie conventional logic programming languages like Prolog in several signifi- 
cant ways. In particular, the enriched formulas allow the arguments of predicates to 
be A-terms rather than just first-order terms, they permit implications and univer- 
sal quantifiers to be used in queries thereby giving rise to new search primitives and 
they support higher-order programming by including quantification over function and 
(limited occurrences of) predicate symbols. By exploiting these additions, XProlog 
provides strong support for what has come to be called the higher-order abstract 
syntax approach to representing formal syntactic objects [57]. 

Our presentation of XProlog below mixes a description of its theoretical basis with 
a feeling for programming in the language. In Section 2.1 we recall the simply typed A- 
calculus upon which the higher-order logic of interest is based, presenting these terms 
in a way a XProlog user would encounter them. In Section 2.2 we introduce the higher- 
order hereditary Harrop formulas and also describe at a high level the computational 
interpretation that XProlog associates with these formulas. In Section 2.3 we illustrate 
the idea of higher-order abstract syntax and the support that XProlog provides for 
this approach by considering an extended example. The discussion in the first three 
sections assumes a simple monomorphic typing system with XProlog. In reality, the 
language allows for polymorphic typing. We discuss this aspect in the last section of 
this chapter. 

2.1 The Simply Typed A-Calculus 

The logic underlying XProlog is based on a polymorphically typed version of the 
simply-typed A-calculus. The types used in this calculus are constructed from sorts 
and type variables by recursive applications of type constructors. For simplicity, we 
initially restrict our attention to a simple, monomorphically typed version of this 
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calculus by leaving out the usage of type variables. We eventually add these type 
variables to the language in Section 2.4. 

In the interpretation used here, we assume given a set of sorts and another set of 
type constructors each element of which is specified with an arity. The types in the 
language are then described through the following rules: 

1. Each sort s is a type; 

2. (c Ti ... r„) is a type provided c is a type constructor of arity n, and ti, t„ 

are types; 

3. If Ti and T2 are types, then ri — > T2 is a type. 

The type defined by the last rule is viewed as a function type, where is called the 
function type constructor. Types other than function types are called atomic. In the 
following discussions, the usage of parentheses is minimized by assuming that — > is 
right associative and has a lower priority than other type constructors. Under these 
assumptions, any function type can be elaborated as ai Q;n—> /3, where /3 is 

atomic. We call cci, q;„ the argument types of such a type and we refer to /3 as its 
target type. 

From the programming perspective, the XProlog language starts out with a set of 
"built-in" sorts and type constructors. This set contains o, the type of propositions, 
and other primary types like int, real, string with obvious meanings. It also includes 
a unary type constructor list which is used to form types of homogeneous lists. These 
sets of sorts and type constructors can be added to by the programmer by using 
declarations that have the following form: 

kind c type type. 

Such a declaration associates with the symbol c an arity that is one less than the 
number of the occurrences of the keyword type in it, and c is considered a sort when 
its arity is zero. As a concrete example, the following declarations define a binary 
type constructor pair and a sort i. 

kind pair type — > type — > type, 
kind i type. 

Based on the enhanced sets of sorts and type constructors, (pair int i), (list int ^ o) 
and (pair i (i ^ i)) are all legal types. 

Assuming sets of typed constants and variables, the terms of the simply typed 
A-calculus are identified together with their types through the following rules: 

1. a constant or a variable of type r is a term of type r; 
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2. the expression (Xxt) is a term of type r2 — > r provided x is a variable of type 
Ti and t is a term of type T2; 

3. the expression (ti is a term of type r provided ti and ^2 are terms of type 
Ti ^ T and Ti respectively. 

Terms defined by the second and third rules are called applications and abstractions 
respectively. We minimize the usage of parentheses by assuming that applications 
are left associative and that abstractions have higher precedence than applications. 

Abstractions are of special interest among the categories of terms, because it is 
they that endow the language the ability to explicitly represent binding. From a 
scoping perspective, an abstraction term of the form Xxt captures the concept that 
a; is a variable that ranges over t. From the perspective of meaning, such a term can 
be understood as a function definition in which x is the formal parameter and t is the 
function body, i.e., supplied with an actual parameter, say t2, the evaluation result 
of this function should be a variation of t in whose structure the occurrences of x arc 
replaced by ^2- Such an evaluation process is encompassed by an application term 
(ti ^2) where ti denotes a function definition and t2 an actual parameter. 

The intended meanings of A-terms are made formal by defining a notion of equal- 
ity between them that takes into account the binding and functional character of 
abstractions discussed above. The formation rules for these terms gives rise to a 
natural notion of subcomponents or subterms. Further, let us say that an occurrence 
of a variable y is bound or free in a term t depending on whether or not it appears 
within a subterm of the form Xy t' and that a variable is free or bound in t if it has 
a free or bound occurrence in it. Finally, let t[x :— s] denote the result of replacing 
all the free occurrences of x by s in t, where t and s are terms and x is a variable of 
the same type as that of s. In this context the rules of A-conversion that identify the 
desired equality notion arc defined as follows: 

(a-conversion) Replacing a subterm of form Ax t of a given term with Xy {t[x := y]), 
provided t/ is a variable with the same type as that of x and not occur in t. 

(/^-conversion) Replacing a subterm of form (Xxt) s of a given term with t[x := s] 
or vice versa, provided for every free variable y oi s, y does not have a bound 
occurrence in t. The subterm (Xxt) s is known as a /3-redex, 

(r^-conversion) Replacing a subterm of form Xx {t x) of a given term with t or vice 
versa, provided t is of type a — > /3, x is a variable of type a and not appear 
free in t. 

The rule of a-conversion recognizes the irrelevance of the names of bound variables 

in an abstraction. For example, the terms (Xxx) and (Xyy) encode the same iden- 
tification function despite the different names given to its formal parameter. The 
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/3-conversion rule formalizes the notion of function evaluation discussed earlier. This 
rule initially seems limited because its application requires s not to have free variables 
that are bound in t. However, if this condition is not satisfied at the beginning then 
a sequence of a-conversions can be used to rename the bound variables in t to avoid 
the name collisions. The T^-convcrsion rule encompasses the common assumption in 
mathematics that the functions / and g are equal if for every term i of a suitable 
type, the function applications (/ t) and {g t) are equal. 

A pair of A-tcrms are considered equal if they can be obtained from each other by 
a sequence of applications of a-, jS- or r]- rules. The computation underlying XProlog 
is in fact organized around a process of comparing A-terms based on this notion 
of equality, and this process is known as unification. The concept of unification 
will be discussed in details in Chapter 4. For now, we can simply understand it as a 
matching process during which variables that are free at the top-level of the terms can 
be replaced by some other term structures in attempting to make the terms equal. A 
key requirement in such a replacement, however, is that we cannot introduce variable 
occurrences that get captured by abstractions occurring in the term into which the 
replacement is done. 

The last issue with regard to understanding the data structure of XProlog is about 
the usage of constants from the programming perspective. The set of constants of 
this language can be partitioned into two sub-categories as logical and non-logical 
ones. The language has internal interpretations to logical constants, and they can be 
used to construct high level computation control. This set of constants consists of the 
symbols T of type o, denoting the tautological proposition, the symbols A, V, D, of 
type o — > o — > o, corresponding to logical conjunction, disjunction and implication 
respectively, and sets of symbols 11^ and Eq, of type (a — o) — > o for each type 
a. The last two (famihes of) logical constants are used to construct universal and 
existential quantifications: formulas usually written as t and 3,t t are encoded as 
TlaXxt and SaAxt, where a; is a variable of type a. The type subscripts associated 
with these constants will be left out when they are not essential to our discussion. 
Further, when the context is clear, we will still use the conventional Vx t and 3x t 
representations for quantifications, and use A, V and D as infix operators for better 
readability. 

Constants other than the logical ones belong to the non-logical set. Built-in 
support is provided to a primary collection of it, and user can increment this set 
by defining their own in the course of programming. The initial set of non-logical 
constants consists of the sets of integers, real numbers, strings (character sequence 
enclosed by double quotes) , nila of type (list a ) and the right-associative binary infix 
operator .-.q of type (a — > list a list a). The last two (families of) constants are 
used for encoding homogeneous lists of element type a, e.g. an integer list can be 
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denoted as (1 2 nilj„t). Again, the type annotations of list and nil will be 
omitted when the context is clear. 

Users can define new non-logical constants together with their types through dec- 
larations of the following kind 

type const <type>. 

where <type> should be replaced by the actual type of the constant. Such declara- 
tions will typically be used when a new set of constants is needed for encoding objects 
that need to be computed over. As a concrete example, suppose that our computa- 
tional task requires us to represent the collection of of closed untyped A-terms built 
from the sole constant symbol a. The following declarations then identify the required 
symbols within XProlog to realize an encoding of such terms: 

kind tm type. 

type a tm. 

type app tm — > tm — > tm. 

type abs {tm — > tm) — > tm. 

A sort tm is first declared as the type of the set of object-level terms, i.e., the set 
of terms to be represented. The second line above declares a constant a as the 
only object-level constant term. Constants app and abs are the selected constructors 
for denoting object-level applications and abstractions respectively: an object-level 
application can be formed by applying app to two arguments of type tm, whereas 
an object-level abstraction is denoted by applying abs to a mcta-lcvcl abstraction 
of type tm tm. Within such a setup, an object-level term (Xx (a x)) (Xyy) 
can be represented as app {abs {Xx {app a x))) {abs {Xyy)). Based on the above 
representations, now we can think of realizing operations over the object-level terms. 
For example, suppose a copy operation, whose functionality is to duplicate a given 
object-level term, is of interest. We can declare a predicate constant, i.e., constant 
with proposition target type, named copy for this purpose. 

type copy tm tm ^ a. 

We expect that this predicate evaluates to true if and only if its first and second 
arguments are identical to each other. Such functionality can be specified through 
definitions of predicates constructed by formulas in our language, which are discussed 
in the next section. 
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2.2 Higher- Order Hereditary Harrop Formulas 

The language of higher- order hereditary Harrop or hohh formulas is determined by 
two special classes of expressions: the G-formulas that function as goals or queries in 
a logic programming setting and the D-formulas that function as program clauses or 
definition clauses in this context. These formulas are essentially subsets of A-terms 
of type o that are constructed from recursive applications of logical constants with 
certain restrictions. 

Using symbol P to denote a non-logical constant or a variable, wc define an atomic 
formula as a term of type o with the structure {P ti ... tn), where, for 1 < i < n, the 
only logical constants appearing in each ti are A, V, S, or 11; a term satisfying such 
a restriction is referred to as a positive term. If the head P of an atomic formula is a 
variable, the formula is said to be flexible and otherwise it is said to be rigid. Using 
the symbol A to denote atomic formulas and Aj. to denote rigid atomic formulas, the 
sets of goals G and program clauses D is identified by the following syntactical rules: 

G y.^T \ A\GAG \GVG \ 3xG \ VxG \ D D G. 
D Ar \ G D Ar \ D AD \ VxD. 

In a program clause of form Ar or G D Ar, Ar is called the head of the clause, and 
for a clause of the latter form, G is said to be its body. The goals in the forms of VxG 
and D D G are called generic and augment goals respectively. 

A program in XProlog is a set of closed clauses, i.e., a set of clauses that do not 
contain any free variables. Computation in XProlog corresponds to solving a top-level 
closed query against a given program and relative to a given signature that identifies 
the set of available constants. The program at the beginning consists of all the clauses 
that the user of XProlog has provided at the top-level and the signature consists of 
all the built-in constants as well as those identified through user declarations. The 
manner in which the computation proceeds is dictated by the top-level structure of 
the query as indicated by the rules below. 

1. The goal T leads to an immediate solution regardless of the program and the 
signature. 

2. The goal Gi A G2 is solved against any program and signature by solving both 
Gi and G2 using the same program and signature. 

3. The goal Gi V G2 is solved against any program by solving one of Gi or G2 
using the same program and signature. 

4. The goal 3xG is solved against a program and a signature by picking a closed 
term t of the same type as x that is constructed using only the constants in the 
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given signature and then solving G[x := t] from the same program and signa- 
ture; notice that the correctness of the replacement of a; by t here is guaranteed 
by the fact that t is closed. 

5. The goal VxG is solved against a given program V and signature S by selecting 
a constant c of the same type as x that docs not belongs to S and then solving 
G[x := c] against the program V and the signature S U {c}. 

6. The goal D D G is solved against a program V and a signature E by solving G 
against the program V U {D} and the signature E. 

7. The rigid atomic goal Ar is solved from a program V and a signature E by 
picking a clause from V, instantiating all the top-level universally quantified 

variables in it with closed terms constructed using only constants in E to get 
a formula that A-converts to the form Ar or G D Ar and, in the latter case, 
solving the goal G from the program V and signature E. 

An important point to note with regard to the rules presented above is that they can 
lead, in particular instances, to changes in the program and the signature against 
which a query is to be solved. In particular, a generic goal can extend the signature 
and an augment goal can lead to additions to the program. These kinds of goals thus 
have the ability to give names and clauses a scope over particular computations. This 
situation is to be contrasted with the usual Horn clauses that underlie Prolog; generic 
and augment goals are not permitted in that setting and consequently the scoping 
ability in question is absent there. 

The above description of the operational semantics for XProlog is not yet suitable 
to be used as a basis for implementation. First, we are assuming an oracle in pick- 
ing a proper instance for existentially quantified variables in queries and universally 
quantified variables in clauses. Second, we have not specified how to select clauses 
for solving rigid atomic goals when multiple possibility exists and nor have said how 
to select the disjunct to solve when processing disjunctive goals. Finally, a practi- 
cal means is needed for controlling the visibility of constants and clauses introduced 
in generic and augment goals. We defer a discussion of these issues till Chapter 4, 
hoping that enough details have been provided here to make clear when a particular 
computation has been correctly carried out. 

The new scoping mechanisms present in hohh formulas provides the ability to re- 
alize recursion over abstractions in A-terms and, thus, over binding structures present 
in object languages over which we are interested in describing computations. To il- 
lustrate this capability, we consider the copy predicate introduced in the previous 
section and show how it can be defined in the XProlog language. 
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Assuming the representation for A-terms that we have aheady presented, it is 
very natural to define the copying computation for constant and apphcations with 
the following two clauses: 

{copy a a) 

(VtiVt2Vt3Vt4 {{copy ti ts A copy t2 t^) D copy {app ti t2) {app t^))) 

These clauses simply state that a copy of the constant a is the constant itself, and 
copying an application can be carried out by constructing a new application over 
the copies of its arguments. Now we need to consider how to recursively copy an 
abstraction of form (abs (\x t)). Intuitively, we would like to have a clause of form 

VtiVt2 {copy ti t2 D copy {abs ti) {abs ^2)) 

to descend into the argument of abs. However, this clause is illegal because the 
arguments of copy should have type tm whereas the argument of abs has type tm 
tm, which essentially corresponds to an abstraction \xt. A more careful consideration 
reveals that the copy of \xt in fact can be realized by first constructing a copy for 
t[x :— c] where c is a new constant, and then constructing an abstraction over the 
structure that results from extracting c out of this copy. These operations can be 
easily expressed by using a generic goal. In particular, consider the clause 

\/ti\/t2 ((Vc copy {ti c) {t2 c)) D copy {abs ti) {abs ^2))- 

The generic goal that appears in this clause will lead to the introduction of a new 
constant c. By applying ti, which essentially corresponds to an abstraction Xxt to 
c, the substitution t[x :— c] is automatically taken care of. Once this structure has 
been copied, the control of the scope of c embodied in the generic goal ensures that 
the only correct instantiation of t2 would be one that extracts c out of t[x := c] and 
constructs an abstraction over it. Thus the recursion over abstractions in defining 
copy is accomplished by the use of a generic goal. However, our program is still not 
entirely correct because there is no clause so far specifying how to copy the constant 
c introduced by the generic goal. The computation itself is very simple and can be 
specified by a clause of form copy c c, but this clause cannot be simply added into our 
program at the top-level because the constant c is only visible inside the generic goal 
we discussed above. The solution is to enhance the clause for copying abstractions 
by the use of an augment goal 

VtiVt2 ((Vc {copy CCD copy {ti c) (^2 c))) D copy {abs ti) {abs ^2))- 

Now the clause copy c c has its scope inside that of c, so that it is effective only when 
computation descends into the body of an abstraction. 
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copy a a. 

copy (app Tl T2) (app T3 T4) :— copy Tl T3, copy T2 T4. 

copy (abs Tl) (abs T2) :- Pi c\ {copy c c ^> copy {Tl c) {T2 c)). 



Figure 2.1: A program defining the predicate copy. 
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Figure 2.2: Encoding the logical symbols in an object logic. 



We shall find it convenient to use in the rest of this thesis a Prolog-Mkc syntax 
in presenting program clauses that are meant to constitute XProlog programs. In 
particular, we always omit top-level conjunctions in a program and use a period to 
terminate top-level clauses. Second, we use capitalized names for universally quan- 
tified variables over top-level clauses and for existentially quantified variables over 
top-level goals and leave the quantifiers implicit. Third, we use the syntax Ar : — 
G. to denote top-level clauses of form G D Ar. Finally, comma and semicolon will 
be used to denote A and V respectively. Based on these conventions and using the 
concrete syntax => for D, Pi for V, and the infix operator \ for A, the copy program 
that we have just described would be presented concretely as in Figure 2.1. 



2.3 An Extended Example 

We now provide a closer look at the power of XProlog and a better feeling for pro- 
gramming in it by considering a extended example of its use in a meta-programming 
task. The particular task we consider is that of encoding formulas from a first-order 
logic and realizing a syntactic transformation on them to produce their prenex normal 
forms, i.e., a form in which all the quantifiers appear at the head of the formula. 

The formulas that we want to encode will be from a logic that, as usual, is char- 
acterized by logical and non-logical symbols. The logical symbols that we assume 
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Figure 2.3: Encoding the non-logical symbols in an object logic. 

here are T, ±, A, V, D, V and 3. We shall encode these by using the constants 
truth, false, and, or, imp, all and some, respectively. In encoding the quantifiers, we, 
once again, separate a treatment of their meanings from a treatment of their binding 
effects. Figure 2.2 contains a set of declarations that identify these constants; the 
type form is used in the encoding to correspond to the category of formulas. For the 
non- logical vocabulary, we shall assume that the object logic has three constants a, 
b and c and a single function symbol / with arity 1. Beyond this, we assume two 
binary predicate symbols adj and path; intuitively, these symbols serve to describe 
graphs, with the first being used to describe an adjacency relation and the second 
the relation corresponding to the existence of a path between two nodes. Using the 
type term to represent object logic terms, the declarations in Figure 2.3 provide an 
encoding of this non-logical vocabulary. 

We illustrate our encoding of formulas by considering the representation of the fol- 
lowing object-level formula that describes a graph with four nodes and that describes 
the path relation in terms of the adj relation: 

adj{a, b) A 

adjib, c) A 
adj{c, f{c)) A 

(WxWy {adj{x, y) D path{x, y))) A 

(ixWy^z {{adj{x, y) f\path{y, z)) D path{x, z)) 

The XProlog term that represents this formula is the following: 

{and {adj a b) 

{and {adj b c) 

{and {adj c {f c)) 

{and {all x\ {all y\ {imp {adjx y) {path x y)))) 

{all x\ {all y\ {all z\ {imp {and {adj x y) {path y z)) 

{path X z)mm. 



20 



type is pterin term — >■ o. 
isJerm a. 
isJ,erm b. 
isJerm c. 

is-term {f X) :— is-term X. 

type is-atomic form —>■ o. 

is^atomic {adj X Y) :— is-term X, is-term Y. 

is^atomic {path X Y): — is-term X, is-term Y. 

type quantifier Jree form — >• o. 
quantifier .free truth, 
quantifier Jree false. 

quantifier Jree A : — is. atomic A. 

quantifier Jree {and A B) : — quantifier Jree A, quantifier Jree B. 
quantifier Jree {or A B) :— quantifier Jree A. quantifier Jree B. 
quantifier Jree {imp A B) :— quantifier Jree A, quantifier Jree B. 



Figure 2.4: Some recognizers for encodings of object logic categories. 

With this representation in place, we consider the specifications of the simple 
properties of being (the encodings of) a term, an atomic predicate and a quantifier 
free formula. Predicates recognizing these attributes of XProlog terms are presented in 
Figure 2.4; the names isAerm, is^atomic and quantifier Jree are used for recognizers 
for each of these categories, respectively. 

We now consider the encoding of the prenex normal form relation. Specifically we 
are interested in writing down a set of program clauses that define a predicate prenex 
such that a goal of the form prenex A B is solvable from them just in the case that 
A and B are both encodings of formulas and, further, B represents a prenex normal 
form of the formula represented by A. The definition of this predicate is presented in 
Figure 2.5. Use is made in this definition of an auxiliary predicate mrg for the purpose 
of raising quantifiers over binary connectives. The definitions of of both prenex and 
mrg use generic and augment goals in a fashion already illustrated with the definition 
of the copy predicate to analyze and synthesize abstraction structures so as to realize 
a recursion over the representation of quantified formulas. 

Given program in Figure 2.5, the query 

?- prenex {or {all x\{and {adj x x) {and {all y\{path x y)) 

{adj {fx) c)))) 

{adj a b) 
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type prenex form 
prenex truth truth, 
prenex B B 
prenex {and B C) D 
prenex {or B C) D 
prenex {imp B C) D 
prenex {all B) {all D) 
prenex {some B) {some D) 

type mrg form form — 
mrg {and {all B) {all C)) {all D) 
mrg {and {all B) C) {all D) 
mrg {and {some B) C) {some D) 
mrg {and B {all C)) {all D) 



form o. 
prenex false false, 
—is. atomic B. 

—prenex B U, prenex C V, mrg {and U V) D. 

—prenex B U, prenex C V, mrg {or U V) D. 
—prenex B U, prenex C V, mrg {imp U V) D. 



—Pi x\{term x => prenex {B x] 
—Pi x\{term x => prenex {B x 



Pi x\{term x => mrg 
Pi x\{term x => mrg 
Pi x\{term x => mrg 
Pi x\{term x => mrg 
Pi x\{term x => mrg 



mrg {and B {some C)) {some D) 
mrg {or {some B) {some C)) {some D) : — 

Pi x\{term x => mrg {or {B x 



mrg {or {all B) C) {all D) 
mrg {or {some B) (Tj {some D) 
mrg {or B {all C)) {all D) 
mrg {or B {some C)) {some D) 
mrg {imp {all B) {some C)) {some D) : — 

Pi x\{term x - 



mrg {imp {all B) C) {some D) 
mrg {and {some B) C) {all D) 
mrg {and B {all C)) {all D) 
mrg {and B {some C)) {some D) 
mrg B B 



— Pi x\{term x => mrg 

— Pi x\{term x => mrg 

— Pi x\{term x => mrg 

— Pi x\{term x => mrg 

-> mrg {and {B x 
Pi x\{term x => mrg 
Pi x\{term x => mrg 
Pi x\{term x => mrg 
Pi x\{term, x => mrg 
quantifier. free B. 



{D x)). 
{D x)). 

and {B x) 
and {B x) 
and {B x) 
and B {C x)) {D x)). 
andB {Cx)) {D x)). 



{Cx)) {Dx)). 
C) {D x)). 
C) {D x)). 



C x)) {D x)). 
or {B x) C) {D x)). 
or {B x) C) {D x)). 
or B {Cx)) {D x)). 
or B {Cx)) {D x)). 

{Cx)) {Dx)). 
imp {B x) C) {D .t)). 
imp {B x) C) {D t)). 
im,p B {C x)) {D x)). 
imp B {Cx)) {D x)). 



Figure 2.5: A specification of the prenex-normal form relation. 
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Pnf. 

should succeed by instantiating the top-level existentially quantified variable Pnf to 
the term 

{all x\{all y\{or {and {adj x x) {and {path x y) {adj {fx) c))) {adj a b)))). 
For another example, the query 

?- prenex {and {all x\{adj x x)) {all z\{all y\{adj z y)))) Pnf. 

is also solvable with any one of the following five instantiations for the variable Pnf. 

all z\{all y\{and {adj z z) {adj z y))), 

all z\{all x\{and {adj x x) {adj z x))), 

all x\{all z\{all y\{and {adj x x) {adj z y)))), 

all z\{all x\{all y\{and {adj x x) {adj z y)))), and 

all z\{all y\{all x\{and {adj x x) {adj z y)))). 

The multiple solutions listed above are a result of the existence of multiple matching 
clauses when solving atomic goals in the course of computation. 

We have only considered one example of the use of XProlog in encoding computa- 
tions over binding structures but, hopefully, this example will provide the background 
for understanding our later discussions about implementation. An interested reader 
can find several other examples in the literature; such examples and a discussion of 
the meta-programming capabilities of the language may be found, for instance, in 
[44] . In realizing such computations we will have to find an effective way for treating 
varied aspects such as search and the selection of instantiation terms, issues that we 
have ignored in the presentation here as noted already. We will take these issues up 
seriously in Chapter 4. Anticipating that discussion we note that the examples of 
prenex and copy both belong to the Lx fragment of XProlog programming, a class for 
which the unification computation is decidable and admits unique solutions and for 
which we are interested in developing a good treatment in this thesis. 

2.4 Polymorphism and the Role of Types in Computation 

Our presentation of XProlog up to now has treated it as if it is monomorphically 
typed. In reality, the type system of XProlog allows for polymorphism to provide 
fiexibility and convenience in programming. Such polymorphism is obtained by ad- 
mitting the use of type variables. In particular, in addition to the sets of sorts and 
type constructors, an infinite supply of type variables is also assumed. Sorts and type 
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variables are basic types, starting from which constructed types, including function 
types, are built by recursive applications of type constructors. In the subsequent 
discussion, we use capital letters to denote type variables. 

Intuitively, a type variable can be viewed as an abbreviation of an infinite set of 
types in the monomorphic type system. Thus a type with type variables occurring 
inside in fact provides a schema for a family of types: sets of more specific types can 
be generated by replacing the contained type variables with other types. For instance, 
the polymorphic type list A denotes a family of list types such as list int for integer 
hsts, list {list int) for hst of integer lists, and list {list B) for hst of hsts whose element 
is of type B, where B can again be instantiated by arbitrary types. Consequently, 
a constant declared with a polymorphic type can be viewed as an abbreviation of 
an infinite set of constants, each element of which has a monomorphic type as an 
instance of the type schema. For example, previously we have families of empty list 
nila and list constructor for each monomorphic type a. Now these sets can be 
abbreviated into two constants nil of type list A and of type A — > list A list A. 
By instantiating the type variable A to int, an integer list can be denoted by 1 :: 2 :: 
nil. Note that the instantiation of the type variable has to be performed in a uniform 
manner across the entire polymorphic type. For example, a structure of form 1 :: 
"a" :: nil is not well- typed since the integer argument of the first requires its type 
variable being replaced by int, whereas the second string list argument demands it 
being replaced by string instead. 

The idea of using polymorphic types to abbreviate sets of constants can also be 
applied to clause definitions. An example for such a usage is the predicate append 
which concatenates the lists in its first two arguments into the third one. 

type append list A — >• list A — >• list A ^ o. 
append nil L L. 

append {X :: LI ) L2 {X :: L3) : - append LI L2 L3. 

The two clauses defining append are shared by the append operation of an infinite set 
of list types. From the programming perspective, this sort of polymorphism is known 
as parametric, where a function (predicate) works uniformly over a range of types. 

In addition to parametric polymorphism, another sort of polymorphism is em- 
bodied by XProlog, which is obtained when a function (predicate) works in unrelated 
ways on several different types and is known as ad hoc polymorphism. An example 
is provided by the following definitions of predicate print, in which we assume pred- 
icates writeJnt, write_string and writeJist for printing out given arguments of type 
int, string and list A respectively to the standard 10. 

type print A ^ o. 

type write Jnt int — > o. 
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type writestring string — > o. 
type writeJist {list A) — >• o. 

print N : — write .int N . 
print L : — writeJist L. 
print S : — writestring S. 

In the execution of the above program, the dispatching to different write methods 
depends on the type of the first argument of print, which is inherited from the top- 
level print query. To achieve this effect, types should participate in the computation of 
solving goals. In particular, they are necessary for deciding the equality of constants, 
e.g., the dispatching in the print example is based on the fact that a constant print 
of integer type is different from those of string type or list types. 

Based on the above discussions, it is clear that the roles that types play in our 
language are two-fold. First, they are used to ensure the correctness of programs, and 
second they participate in computation for deciding the solvability of queries. When 
playing the first role, types are used to identify legitimate terms by restricting the 
applicability of specific operations, thereby providing a control over the computations 
that can be attempted. From this perspective, the usage of our type system is very 
similar to that of the functional language SML [13, 24]. Naturally, it can be expected 
that this usage should be discharged at compilation time. In actual compilation 
based implementations of XProlog, a type checking procedure, which encompasses a 
process inferring types for every term in program from those declared with constants, 
is commonly used by the compilers for this purpose. The second role of types is 
more peculiar to logic programming languages, where types are actually employed 
during runtime computation and have an influence on the solutions [46]. The speciflc 
way in which the computations are determined depends on the particular algorithm 
used to realize the underlying unification operations, and will be clarified in the 
discussions of Chapter 4. It should be noted here that this usage requires types to be 
manipulated during the execution of programs, which consequently poses a challenge 
on efficient implementations of XProlog with regard to minimizing runtime overhead 
of this sort. An optimized runtime type processing scheme is provided by this thesis 
in the particular context where computation is organized around the higher-order 
pattern unification, and the discussions about it appear in Chapter 7. 



Chapter 3 



Comparison of A- Terms 



The computational model described in the previous chapter requires the comparison 
of atomic goals: in solving a goal of the form Ar, we have to find an instance of a clause 
that is equal to Ar or to G D Ar. Observe, however, that the notion of equality that is 
involved here is richer than that used in first-order logic programming. In particular, 
we are allowed to use the conversion rules of the A-calculus in determining if the 
instance of a clause has the required form. A question that must be addressed in 
an implementation of our language, therefore, is how to effectively carry out such a 
determination. As we discuss in this chapter, comparisons of this kind between terms 
can be realized by first reducing them into a normal form. The process of transforming 
a A-term into a normal form is not trivial and must be given careful attention from 
an efficiency perspective. An aspect that must be given special consideration in 
this context is the treatment of substitutions that are generated in the course of 
reductions. We discuss the various issues involved in the overall comparison process in 
this chapter, leading eventually to what is known as an explicit substitution notation 
for A-tcrms. This notation eventually serves as a high-level representation for such 
terms that we later refine into an actual machine-level implementation. 

This chapter is structured as follows. In the first section we provide an overview 
of the comparison of A-terms, introducing in the process the idea of using normal 
forms as the basis for such comparisons. Section 3.2 then discusses at a high level 
the issue of carrying out /9-reductions on terms in the course of producing normal 
forms. This discussion highlights the importance of treating substitutions carefully 
in the course of reduction. The next section presents an explicit substitution notation 
for A-terms that is known as the suspension calculus [20]; this notation provides the 
basis for realizing the normalization of terms in a finely controlled way and is what 
underlies the term representation we use in the implementation scheme developed in 
this thesis. Section 3.4 contains some formal properties of the suspension calculus and 
it also lifts the idea of normal forms and of rewriting sequences to produce normal 
forms to the suspension calculus. This discussion underlies the reduction procedure 
that is eventually used in the implementation to realize the comparison operation. 
We conclude this chapter with a discussion of how the ?7-conversion rule can be taken 
into account in the context of the suspension calculus. 
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3.1 Normal Forms and Term Comparison 

Normal forms usually play an important role in the comparison of terms in a situation 
where equality encompasses a richer notion than a simple check for syntactic identity. 
In the context of the A-calculus, a useful such form is what is known as a head normal 
form. A term is said to be in such a form if, for some n,m >0, it has the structure 

{Xxi ...{Xxn {...{hti) ...tm))---), 

where h is a constant or a variable, possibly in the set {xi, . . . ,Xn}. We call the 
abstractions at the front of such a term its binder, the atom h its head of the term 
and ti, . . . ,tm its arguments. Notice that, in particular instances, the binder might 
be empty and the term may also not have any arguments. A special case of a head 
normal form is one where each of its arguments recursively have this structure. A 
term that satisfies this structure is said to be in f3 -normal form. 

An alternative characterization of a /3-normal form — that is easily seen to be 
equivalent to the one provided above — is that it is a term that does not have any 
/3-redexes as subterms. Wc can think of trying to convert an arbitrary A-term to a 
/3-normal form by orienting the /3-convcrsion rule. In particular, given a term that 
has a subterm of the form (Xxt) s, wc can first use a-conversions to rename the 
bound variables in t so that they are distinct from the free variables of s. If we obtain 
the term (Xxf) s from this process, we can then replace this subterm by the form 
t'[x := s]. We shall refer to such a sequence of applications of the a-conversion rule 
followed by the oriented application of the /3-conversion rule as a /3-contraction and 
we call a sequence of /3-contraction rule applications a /3-reduction. An important 
property of the simply typed A-calculus, that carries over also to the polymorphic 
version of it that is used in AProlog, is that any /3-reduction sequence that starts 
from a given term must terminate [21]. It follows from this that every term in our 
language can be converted to a /3-normal form and hence also a headnormal form. 
We shall refer to such a form as a /3-normal (head normal) form for the term. 

Two terms that have identical /3-normal forms are obviously equal under the 
A-conversion rules. Ignoring for the moment the jy-conversion rule, a converse of 
this observation is also available by virtue of the Church- Rosser Theorem for the 
A-calculus [4]: two terms that are equal must have /9-normal forms that differ only 
in the names used for bound variables. We can use this observation to describe an 
algorithm for comparing two terms that have the same types; it is only such terms 
that we ever need to compare in the execution model for AProlog. First, we take the 
two terms and convert them into head normal forms. At this stage, we compare their 
binder lengths. If these are not equal, then the terms are not equal. Otherwise, using 
a sequence of a-conversions, we can ensure that the names of the variables in the 
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two binders are identical; later we shall consider a nameless representation of bound 
variables in the style of de Bruijn [8] that shall make this renaming step redundant. 
Now, if the heads of the two terms are distinct then the terms are once again unequal. 
If, on the other hand, the heads are identical, then the typing assumption ensures 
that they must have an equal number of arguments. The comparison of the two terms 
now reduces to a pairwise comparison of their arguments. 

The comparison algorithm that we have just described is, of course, inadequate in 
the situation when the 77-conversion rule is also included. However, a simple change 
to it suffices in this richer context. After we have converted the two terms to head 
normal forms, it may be the case that one of them has a shorter binder than the other. 
In this case, our first task is to extend the length of the shorter binder. Suppose that 
this term is of the form Xxi . . . Xxnt. Clearly t must have a function type, i.e., its 
type must be of the form a — > /3. But then the term under consideration is equal by 
virtue of the r^-conversion rule to the term Xxi . . . Xxn Xxn+i {t where Xn+i is 

some variable that docs not appear free in t. By a repeated use of transformation, 
we can make the binders of the two terms of equal length. The comparison algorithm 
now proceeds as before. The correctness of this algorithm follows from a version of 
the Church-Rosser Theorem that applies to the situation where the 77-rule is included. 

3.2 Issues in the Realization of /3-reduction 

Prom the discussion in the previous section, it is clear that the reduction of a A- 
term to a head normal form is an important component of the term comparison 
operation. However, the realization of this transformation is not trivial. Theoretical 
presentations of the A-calculus typically treat the substitution required in rewriting a 
/3-redex as an atomic operation. In particular, given a term of the form {Xxt) s, the 
sequence of a-conversions that produces the term {Xx f) s that are intended to avoid 
the capture of free variables in s and the subsequent rewriting to the form t'[x := s] is 
assumed to be achieved magically in a single step. However, from an implementation 
perspective, this is a task too complicated to be accomplished in one step. The actual 
realization of this operation usually combines the renaming of the bound variables in 
t and the replacement of the free occurrences of x by s into one combined operation. 
It then breaks this operation into smaller steps: an environment is maintained to 
explicitly record the needed variable replacements and each rewriting step focuses 
on propagating the environment over a specific sort of term structure. Specifically, 
at the beginning of the performance oi t[x :— s], [x := s] is first registered into an 
environment e. Then the rewriting task becomes that of propagating e over t. The 
interesting case arises when t is of form Xyt'. Now if y does not occur free in s, 
the same environment e can be simply pushed inside the abstraction. Otherwise, 
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the occurrences of the variable y in \yt' should be renamed to such that z does 
not appear free in s. The renaming action [y := z\ is then accumulated into the 
environment. As a result, we have an environment propagation step in this situation 
that is given by a rewrite rule of the form 

{\yt')[x := s] — > \z{t'[y := z,x := s\), 

assuming y appears free in s. When variable occurrences are finally encountered in the 
substitution performance process, replacement can be actually carried out according 
to information recorded in the environment. 

In the discussion above, we have thought of using an environment to encode 
multiple simultaneous substitutions. Although the environment in the example we 
have considered has exactly one substitution generated from rewriting a 5-rcdcx, it 
is possible to imagine environments that have more than one such substitution. By 
allowing for such environments, we obtain an ability to combine the term traversal 
needed in effecting substitutions with the traversal needed for finding and reducing 
/3-redexes. As an example, consider the term {\x\yti) t2 ^3. This term can be 
transformed through a sequence of contractions to the form ti[x := t2,y := h]. 
The replacement in ti of x by t2 and y by t-^ can now be done at the same time and 
can also be combined with the identification and rewriting of further /3-redexes within 

We have treated an environment or substitution up to now as an auxiliary de- 
vice, outside the term structure, to be used essentially in implementing reduction. 

However, it is also possible to include substitutions explicitly in terms, treating a 
term with a substitution also as a term; such a term is similar to the idea of a clo- 
sure used in implementing functional programming languages except that closures 
are now also treated as first-class terms. If we allow substitutions to be used in this 
manner, so that we permit the term t in an environment of the form [x :— t] to 
carry its own environment, then wc also obtain the ability to delay the performance 
of substitutions so as to carry them out in a demand driven fashion, thereby further 
enhancing the capability to combine reduction and substitution traversals of terms. 
As an example, consider the term {Xx {{\y ti) ^2)) ^3- This term can be rewritten 
to the form ti[y :— t2[x :— ts],x :— ts]. Notice that in this term we have delayed the 
substitution of ts for x in t2. We may eventually need to reduce the term t2 and the 
mentioned substitution can then be carried out in the same traversal as is needed for 
this reduction. 

Implementations of functional programming languages typically use the idea of 
environments to encode substitutions. A simplifying assumption that is used in these 
contexts is that it is never necessary to look at term structure embedded within 
abstractions. As a result of this assumption, there is never any need to rename bound 
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variables: the terms that are being substituted are never carried into a context where 
their free variables may get bound. The assumption of looking within abstractions is, 
however, no acceptable in a situation where we have to compare arbitrary A-terms. 
For instance, to decide the inequality of the terms 

{Xy {{Xx {Xy x)) y)) and {Xy {{Xx {Xy y)) y)), 

/3-redexes inside abstractions have to be rewritten. Combining renaming substitutions 
with /^-contraction substitutions seems not to be a problem when we use explicit 
names for bound variables. However, the need to also consider a-convertibility in 
comparing terms usually dictates that an nameless representation be used for bound 
variables. In such a situation, the descent into abstraction contexts requires a lot 
more care. This issue is specifically dealt with in explicit substitution calculi like the 
suspension calculus that we discuss next. 

3.3 The Suspension Calculus 

Before the actual discussion on the suspension calculus, we first introduce a notation 
of A-terms proposed by de Bruijn [8] that simplifies the task of checking for equality 
under a-conversion. In this notation, an occurrence of a variable is denoted by a 
positive number, called a de Bruijn index, which counts the number of abstractions 
between this occurrence and the abstraction binding the variable. For example, the 
term represented as [Xx {Xy {x y)) x) in a name-based setting is denoted in the de 
Bruijn notation by (A (A (^^2 7^1)) 7^1). It can in fact, be easily seen that any pair 
of a-convertible terms in the name-based notation have the same de Bruijn repre- 
sentation. It should be noted that bound variable renaming needed for substitution 
propagation discussed in the previous section is not really eliminated by the de Bruijn 
notation, but is, rather, transformed into a form as the renumbering of de Bruijn in- 
dexes. For example, upon pushing substitutions into an abstraction in the context of 
the de Bruijn notation, it has to be properly refiectcd that first the index correspond- 
ing to the variable that will be substituted should become one greater than that is 
recorded in the current substitutions, and second, the indexes corresponding to the 
variables occurring free in the term that is to be substituted with should be increased 
by one. Moreover, when an environment based reduction approach is under consid- 
eration, a problem similar to what has been discussed in the previous section also 
exists in combining a substitution corresponding to a redex embedded in an abstrac- 
tion, e.g., A ((At) s), to the enclosing environment: in addition to the substitution of 
s for the first free variable in t, the decreasing of the indexes corresponding to the 
variables occurring free in t should also be properly reflected into the environment. 
The details on how the required renumbering tasks are accomplished in the context of 
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the suspension calculus, which is based on the de Bruijn representation, will become 
clear in the discussions that follow. 

It has been illustrated in the previous section that the explicit maintenance of sub- 
stitution environment could be beneficial to the efficiency of the /3-reduction process. 
The explicit encoding of substitutions in term representations provides a stronger 
control on the reduction and substitution steps and thereby the flexibility of or- 
dering them towards further efficiency improvement to the overall term comparison 
operation. One such benefit is the ability to avoid unnecessary performance of sub- 
stitutions. For example, consider the comparison of the pair 

{Xx {x t)) a and {Xx {x s)) b 

where a and b are different constants and t and s are some complicated term struc- 
tures. By reducing the redexes, substitutions [a; := a] and := 6] are generated over 
{x t) and {x s) respectively. It is obvious that the inequality of the terms is in fact 
entirely decided by the results of applying the substitutions over the leading x's, and 
is irrelevant to those of t and s. With the capability to record substitutions along 
with other term structures, the generation and performance of substitutions can be 
completely separated in an explicit substitution calculus. This provides the chance to 
delay the performance of the substitutions on t and s, and consequently to carry out 
the comparison on the structures (a {t[x := a])) and (6 {s[x := 6])), which eventually 
avoids the effort of effecting the delayed substitutions over t and s. 

Various explicit substitution calculi have been proposed for refiecting substitutions 
into term structures, such as the suspension calculus [48], the Xa- calculus [1], the 
Xv-calculus [6], the X^-calculus [37] and Xs^-calculus [28]. Among those calcuh, the 
suspension calculus and the Acr-calculus are especially useful because besides the 
lazy performance of substitutions, these notations also provide support to combine 
substitutions generated from different /3- redexes; such a capabihty is essential for 
realizing the sharing of structure traversal discussed in the previous section. In this 
thesis we choose to use the suspension calculus because it more closely attuned to 
practical applications in comparison with the Acr-calculus. 

The terms of the suspension calculus are obtained from de Bruijn terms essen- 
tially by adding a new form that is capable of representing a term with a suspended 
substitution. The full collection of terms is described formally by the syntax rules in 
Figure 3.1. In these rules, C represents constants, N denotes the category of natural 
numbers and / represents the category of positive numbers. Expressions of the form 
ft, o/,n/,e], referred to as suspensions, constitute the new category of terms. Intu- 
itively, such a suspension represents a term t whose first ol free variables, i.e., those 
given by de Bruijn indices ranging from 1 to o/, should be substituted for in a way 
determined by e and whose other variables should be renumbered to refiect the fact 
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Term ::= C \ jj^I \ {Term Term) \ (A Term) \ \Term,N,N,Env\ 
Env ::= nil \ EnvTerm :: Env 

EnvTerm @N \ {Term,N) 

Figure 3.1: The syntax of terms in the suspension calculus. 

that t originally appeared inside ol number of abstractions, but now appears within 
nl of them; nl may be different from ol either because some abstractions enclosing 
t have disappeared because of /3-contractions or because t is being substituted into 
a context embedded within some additional abstractions. The environment e, that 
has the structure of a list, explicitly records substitutions to be performed for the 
first ol free variables in t — the ith entry in this environment is intended to be the 
substitution for the ith free variable. Consequently, e should have a length equal 
to ol for the term to be well-formed. Two sorts of substitutions can be recorded in 
an environment. One kind of substitution corresponds to abstractions that persist 
even after some abstractions within whose scope they appear disappear because of 
/^-contractions. Such substitutions arc recorded in an environment by means of ex- 
pressions of the form @l, where / is the count of the number of abstractions within 
whose scope the one binding the variable in question occurs; the difference between 
/ and the count of the abstractions that persist at the point of substitution — given 
by nl in a term of the form |t,o/,n/,e] — determines the new index for the variable 
being substituted for. Notice that from this discussion it follows that, for any @l that 
appears in the environment e in a well-formed suspension |t, e], it must be the 
case that I < j. The other sort of environment entry corresponds to the substitution 
for the variable bound by an abstraction that disappears because of a /3-contr action. 
Such a substitution is recorded by an expression of the form (s, /). The natural num- 
ber / records the number of abstractions within which the /3-redex whose contraction 
generated the substitution is embedded; when the variable replacement is actually 
carried out, I is used together with the embedding level at the point of replacement 
to determine an adjustment for indexes of free variables in s. Prom this it follows 
easily that a suspension e] is well-formed only if it is the case that I < j for 

any (s, /) contained e. 

The collection of terms is complemented in the suspension calculus by a set of 
rewriting rules for simulating /5-reduction. The rules are present in Figure 3.2. We 
use e[{\ to refer to the ith item in an environment. Among these rules, {(3s) and {(3'^) 
generate the suspended substitutions corresponding to the reduction of /3-redexes; 
rules (rl)-(r8), referred to as reading rules, are used to actually carry out those 
substitutions. 
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iPs) ((Atl) h) ^ 1,0,(^2,0) ::m/] 

iP's) ((A 1^1, ol + l,nl + 1, @n/ :: e]) ts) ^ I^i, ol + 1, n/, (ts, n/) :: e] 
(rl) |c, oZ, nZ, e| — > c 

provided c is a constant 
(r2) |#^,o/,n/,e]-.#j 

provided i > ol and j = i — ol + nl. 
(r3) I#^,o/,n/,e]^#j 

provided i < ol and e[i] = @Z and j — nl — I. 
(r4) |#i,oZ,nZ,e] [t,0,j,mZ] 

provided i < ol and e[i] = (t, /) and j = nl — I. 
(r5) |(ti t2),o^,?^/,el ^ ([ti,o/,n/,e] [^2, o/, n/, e]). 
(r6) |(At), o/,n/,e] ^ (A [t, o/ + 1, n/ + 1, @n/ :: e]). 
(r7) lit, ol,nl, ej,0, nl' , nil} lt,ol,nl + nr,ej. 
(r8) {t, 0, 0, nil] t 

Figure 3.2: The rewriting rules for the suspension calculus. 

Now we use a concrete example to illustrate how /3-reductions can be performed 
in the suspension calculus. Consider the term 

((A((A(A((#1 #2) #3)))t,))ts), 

where t2 and are arbitrary de Bruijn terms. Using rule (Ps) to reduce the outermost 
redex, the term is rewritten to 

tt((A (A ((#1 #2) #3))) h), 1, 0, {ts, 0) :: mlj. 

Now the suspended substitution needs to be propagated into the top-level application, 
which is accomplished by applying rule (r5). 

[(A (A ((#1 #2) #3))), 1, 0, ih, 0) :: nil] {h, 1, 0, {h, 0) :: nilj. 

Using rule (r6) to push the substitution into the abstraction in the suspension term 
on the left, the whole term is rewritten to 

(AI(A((#1 #2) #3)), 2, 1,00 :: (t3,0) ■■ml}) |t2, 1, 0, (tg, 0) ■■ml}. 

Now a new /3-redex is revealed in the top-level term structure, and the reduction of 
this redex can be simulated by rule (Pg), which directly combines the newly generated 
substitutions into the existing environment. 

[(A((#l #2) #3)), 2, 0,(1^2,1,0, (t3,0) ::mZl,0) :: (^3,0) ■.-.mlj. 
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By applying rules (r5)-(r8) several times, we finally get a term of form 

(A((#l [i2,l,l,(t3,0) ■.■.nil]) |t3,0,l,mZl)). 

Depending on the particular structures of t2 and ts, the rewrite rules can be applied 
to finally produce a /9-normal form of the original term. 

It can be observed that the rule {(3'^) is in fact redundant if our only purpose is 
to simulate /3-reduction: whenever this rule is applied, rule (/3s) is applicable too. 
However, this rule plays an important role in our rewriting system because it serves 
to combine the substitution generated from an redex with those already recorded in 
the environment and thus shares the term travcrsals for reducing nested redexes. A 
particular pattern is required by (/J^) on the redex to be reduced: 

{{Xlh,ol + l,nl + l,@nl :: e]) ^2)- 

This pattern matches the result of propagating the suspension |Aii, oZ, nZ, e] inside 
the abstraction, and arises frequently in the presence of nested redexes when the 
reduction process follows an outermost and leftmost order. 

3.4 Head Normalization and Head Reduction Sequences 

The capability of the suspension calculus to simulate /3-reductions in the conventional 
A-calculus is justified in [39] in two steps. First, it is shown that each well- formed term 
in the suspension calculus can be transformed into a de Bruijn term by applying a 
finite sequence of reading rules for carrying out the suspended substitutions. Second, 
it can be shown that a de Bruijn term t /3-reduces to s if and only if t can be 
transformed to s by applying a finite sequence of rules in Figure 3.2. 

As noted already, it is beneficial to interleave the performance of substitutions 
also with the process of comparing terms. To justify this at a formal level, it is 
necessary to lift the notion of head normal forms to the suspension calculus. The 
following definition does this after restating the definition for such forms in the de 
Bruijn setting. 

Definition 3.4.1. A de Bruijn term is in head normal form if it has the structure 

(A ...(A (...(/iti) ... tr„))...), 

where h is a constant or a de Bruijn index. As before, we call ti, . . . ,tm the arguments 
of such a term, we call h its head, we call the abstractions in the front its binder and 
we refer to the number of such abstractions as the binder length. By a harmless abuse 
of notation, we permit the number of arguments and the binder length to be in such 
a form. The notion of a head normal form is extended to the suspension calculus 
setting by allowing the arguments of such a form to be arbitrary suspension terms. 
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The algorithm that we have previously described for comparing two terms in the 
named calculus has an obvious adaptation to the de Bruijn setting; the essential differ- 
ence is, in fact, that the adjustment to names of bound variables using a-conversions 
is obviated. The following proposition, proved in [39], allows this algorithm to be 
adapted to the suspension calculus context. 

Proposition 3.4.1. Let t be a de Bruijn term and suppose that the rules in Figure 3.2 

allow t to be rewritten to a head normal form in the suspension calculus with h being 
the head, n being the binder length and ti,...,tm being the arguments. Let \ti\ be 
the de Bruijn term obtained from ti by a series (maybe empty) of applications of the 
reading rules. Then t has the term 

(A ...(A {...{h |ii|) ...\tm\))...) 

with a binder length of n as a head normal form in the context of the de Bruijn 
notation. 

A critical part of using the comparison algorithm is that of generating a head 
normal form for a term. Such a form is best generated by rewriting a head redex of 
the term at each stage; a sequence of such rewritings is what is referred to as a head 
reduction sequence. In the de Bruijn setting, a term that is not in head normal form 
has a unique head redex that is identified as follows: 

1. If the term is a /5-redex, then the term itself is its head redex; 

2. Otherwise, if the term is of form (A t) or [t s), then its head redex is that of t. 

In this setting it is also a fact that a head reduction sequence will always succeed in 
producing a head normal form for a term whenever it has such a form [4]. 

In the suspension calculus, there is one more kind of term and there is also a larger 
set of rewriting rules. Moreover, the use of an environment to record substitutions 
also leads to the possibility of sharing subparts of terms. Taking these aspects into 
account, we can generalize the notion of head redex and defines the head reduction 
sequence in the context of the suspension calculus as the following. 

Definition 3.4.2. Let t be a suspension term that is not in head normal form. 

1. Suppose that t has the form {ti ^2)- If ti is an abstraction, then t is its sole 
head redex. Otherwise the head redexes oft are the head redexes oft\. 

2. If t is of the form (A ti), its head redexes are identical to those of ti. 

3. If t is of the form \ti,ol, nl, e], then its head redexes are all the head redexes of 
ti and t itself provided ti is not a suspension. 
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{r]s) t ^ ^^^(It, 0, n, ml] #n . . . #1) 

n 

provided n > 0. 

Figure 3.3: The jy-rule in the suspension calculus. 

Let two subterms of a term be considered nan- overlapping just in case neither is 
contained in the other. Then a head reduction sequence of a suspension term t is 
a sequence t = Tq, ri, r2, . . . , r„, . . . , in which, for i > 0, there is a term succeeding 
ri if ri is not in head normal form and, in this case, r^+i is obtained from ri by 
simultaneously rewriting a finite set of non- overlapping subterms that includes a head 
redex using the rule schemata in Figure 3.2. Obviously, such a sequence terminates 
if for some m> it is the case that is in head normal form. 

The usefulness of this definition is based on the proposition below: 

Proposition 3.4.2. A term t in the suspension calculus has a head normal form if 
and only if every head reduction sequence of t terminates. 

A detailed proof of this proposition can be found in [39] , which essentially maps the 
head reduction sequences of suspension terms to the corresponding ones in the context 
of the de Bruijn notation. By virtue of this proposition, we can base the comparison 
of terms on a procedure that exploits the suspension form to delay substitutions and 
that essentially picks a head reduction sequence to try and reduce a given term to 
a head normal form. Notice that, unlike in the case of de Bruijn terms, there can 
actually be a choice in the head redex to rewrite at each stage. This non-determinism 
provides a flexibility that can be exploited by practical reduction procedures, a topic 
that we elaborate on in Chapter 5. 

3.5 The Suspension Calculus and 77- conversions 

In comparing terms, we have also to take into account that our equality notion in- 
cludes r^-conversions. In the conventional setting, this fact is accommodated by al- 
lowing the comparison procedure to use jy-conversions to adjust binder lengths in case 
the reduction process yielded two head normal forms for which these were unequal. 
A similar adjustment can be carried out also when the suspension calculus is used. 
The basis for such an adjustment is a special form of the 77-rule for this setting. The 
relevant rule is presented in Figure 3.3. This rule has an additional proviso when 
types are associated with terms: t must have a function type that has at least n 
argument types. Notice also that some of the reading rules can also be compiled 
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into the application of this rule when it is used to adjust the binder length in a head 
normal form. Thus, the head normal form 

X_^_^^{h ti tm) 

k 

can be rewritten to the form 

A_^^(/i lti,0,nl,nilj |i^,0,nZ,mZ] #n . . . #1) 

k+n 

if /i is a constant and to the form 

lh,0,nl,mlj ltm,0,nl,mlj #n . . . #1) 

k+n 

where j is i + n if /i is the de Bruijn index j^i. 



Chapter 4 



An Abstract Interpreter for AProlog 

A high-level description of the computation model of the XProlog language has been 
provided in Chapter 2. This description is helpful for understanding XProlog pro- 
grams, but is not quite suitable as a basis for implementation. For the latter pur- 
pose, concrete mechanisms have to be provided first for deciding proper instances 
for existentially quantified variables in solving goals of form 3xG and for universally 
quantified variables in clauses for solving atomic goals, second for selecting clauses 
for solving atomic goals in the presence of multiple candidates as well as for picking 
the disjunct to solve when processing disjunctive goals and third for controlling the 
scopes of constants and program clauses with respect to generic and augment goals. 

In this chapter, we refine the computation model appearing in Section 2.2 into an 
abstract interpreter for XProlog that includes solutions to all the issues mentioned 
above. We begin in Section 4.1 with the issue of finding instances for variables 
existentially quantified in goals and universally quantified in clauses. Towards this 
end, we introduce a new category of variables, the logic variables, into the term 
representation and we generalize term comparison into an equation solving operation 
called unification that is based on the new representation. Section 4.2 presents an 
abstract interpreter for XProlog that uses this operation. In Section 4.3, a particular 
form of the unification problem for A-terms is described and a practical algorithm is 
presented for solving such problems. Problems in this class are what are referred to as 
higher-order pattern unification problems. This thesis is concerned only with solving 
such problems completely and we assume a refinement of the abstract interpreter that 
uses only the algorithm that we present for solving such problems in the rest of the 
thesis. 

4.1 Logic Variables and Unification 

The problem of deciding suitable instances for existentially quantified variables in 
goals and universally quantified variables in clauses is one that is also faced in the 
implementation of Prolog. It is solved in that setting by delaying the selection of an 
instance till a later point in computation when enough information is available for 
making the "right" choices. We adopt this solution also in our context. Specifically, 
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(r9) {X, ol,nl,el —>■ X, provided X is a logic variable. 
Figure 4.1: The rewriting rule in the suspension calculus for logic variables. 

when a goal 3xG is encountered, a new variable X that can be instantiated in the 
course of computation is introduced to replace x in G; this variable, that is different 
from traditional variables in logic in that it can actually be instantiated in the search 
for a proof, is what is known as a logic variable. Note that in a setting where types are 
present, X should have the same type as the quantified variable it replaces. Once this 
variable is introduced, computation proceeds to solve the goal G[x := X]. The actual 
instantiation of X is determined at the point of solving the atomic goals contained 
by G through unification. This is a process or computation that allows us to pick 
instantiations for logic variables so as to make two terms equal. Thus, suppose that 
we have reached a point where the atomic goal A' has to be solved. We then look 
for a clause of the form Vxi . . . WxnA or \/xi . . . \/xn{G D A) such that by replacing 
the universally quantified variables in the front of this clause with new logic variables 
Xi, . . . , X„, we get an expression of the form A" or G" D A" that has the characteristic 
that A!' and A can be unified; in the second case, this leads to a subsequent attempt 
to solve the corresponding instance of G" . 

The unification operation generalizes the usual term comparison in the sense that 
we are also allows to compute substitutions for logic variables to make the terms 
under consideration equal. There is a proviso in our context that the substitution 
computed for a variable by this process should be of the same type as the variable. 
Further, in the context of unifying A-terms, substitutions for such variables should 
also make sure that the free variables in the terms being introduced do not get 
accidentally bound. A correct characterization of such substitution can be provided 
by using the equality of A-terms. A substitution is typically given by a set of pairs 
of the form < i < n] where the first element of the pair is the variable 

being substituted for and the second element is the term that it should be replaced 
with. The application of such a substitution to the term t can be given by the term 

{XXi . . . \Xn t) ti ... tn. 

Since we have to eventually deal with logic variables in an implementation, we 
extend the suspension calculus to accommodate them. As we have already noted, 
these variables have a different character from the usual variables in A-terms and so 
we include a new category for them in the syntax. We shall write such variables 
with a starting uppercase letter. We also add a special rewrite rule pertaining to 
such variables that is shown in Figure 4.1. The rule is justified by the fact that 
substitutions for logic variables cannot be captured by enclosing abstractions and 
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hence cannot be affected by any reduction or renumbering substitutions. With the 
addition of logic variables, we have also to extend our definition of head normal forms 
to include the case where the head is also such a variable. We shall say now that a 
head normal form is flexible if it has a logic variable as its head and that it is rigid if 
the head is a constant or de Bruijn index. 

A unification problem in the context of the typed A-calculus is known as a higher- 
order unification problem. Such a problem can be represented by a disagreement set 
that is a finite collection of pairs of A-terms, known as the disagreement pairs, in 
which the two terms in each pair have equal types. A solution to, or a unifier for, 
the problem is substitution for logic variables — also represented as a set of pairs of 
terms as discussed earlier — that is such that it makes the two terms in each pair in 
the disagreement set equal when it is applied to them. A useful notion in the context 
of unification is that of a most general unifier. This is a unifier for a disagreement set 
from which any other unifier for the set can be obtained through further substitutions 
for logic variables. Unfortunately higher-order unification does not admit of most 
general unifiers. Particular problems may, in fact, have an infinite set of unifiers none 
of which can be obtained from others in the set through further substitutions. A 
further observation is that no procedure can be provided that computes a covering 
set of unifiers in a non-redundant way. However, a non-redundant search can be 
carried out to determine unifiability. Huet has in fact described a procedure that 
carries out such a search [26]. This procedure computes initial portions of unifiers 
that are known as pre-unifiers. In several instances, the pre-unifiers that it computes 
turn out actually to be complete unifiers for the problem under consideration. 

Huet's procedure consists of two phases, which are repetitively invoked on a given 
disagreement set to transform it into a form from which it can be decided that no 
unifier exists or for which unifiability is evident. Since equality is based on the rules 
of A-conversion, we can assume that the two terms in each disagreement pair in a 
unification problem are in head normal form and that their binders have been adjusted 
to have the same length. Now, the first phase of Huet's unification procedure handles 
pairs in which both terms are rigid, i.e., rigid-rigid pairs, in a way similar to term 
simplification in first-order unification: depending on whether or not the two heads are 
equal, the unification problem is simplified to one consisting of pairs formed out of the 
arguments or non-unifiability is determined. The second phase of Huet's algorithm 
considers flexible-rigid pairs, and attempts to bind logic variables as the heads of the 
flexible terms. In particular, assuming the logic variable X is the head of the flexible 
term, a substitution of form {X, A . . . A (r {Hi #n . . . #1) . . . {Hm i^n . . . #1))) is 
produced, where Hi . . . H^ are new logic variables of proper types. The substituted 
term has the binder length n that is decided by the number of arguments in the type 
of X. The head r can be a de Bruijn index ^i, for 1 < i < n, when the ith argument 
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of the type of X has m arguments and has target being the same as the that of 
the type of X, or a constant c when the rigid term has c as its head, and the type 
of c has m arguments. Observations that are important to our discussions should 
be made on the following two issues. First, multiple bindings can be found for the 
same logic variable during the binding phase, and they cannot be obtained from each 
other by performing further substitutions. Second, the types of logic variables play 
an important role in determining the structures of the bindings; in particular, it is 
used to decide whether a de Bruijn index can be made the head of the binding term. 
The details on unifying flexible- rigid pairs in Huet's algorithm is beyond the scope 
of this thesis and we refer interested readers to [26]. In addition to the rigid-rigid 
and flexible-rigid cases, pairs containing flexible-flexible terms may also occur during 
unification. A pair of this sort is known as always unifiable, but a complete search 
for the unifiers can be unconstrained [26]. Huet's algorithm treats a set consisting of 
only such pairs as a success without further exploring the underlying unifiers. 

The iterative use of the term simplification and binding phases in Huet's algorithm 
naturally forms a branching search. If the searching process terminates, cither non- 
unifiability is determined or a finite complete set of unifiers up to fiexible-fiexible 
pairs for the given disagreement set is produced. 

The undecidability property of higher-order unification manifests itself in the fact 
that the search conducted by Huet's procedure may not find a success at any finite 
depth, i.e., the search may go on for ever. Even when successes are found at finite 
depth, the search still not terminate because the number of successes to be found 
may be infinite. 

The theoretical properties of higher-order unification and the branching search 
that must be conducted make it seem as if such unification cannot be used effec- 
tively in a practical setting. However, the actual utilization of Huet's procedure in 
several programming systems, including an implementation of XProlog that is known 
as Teyjus Version 1 and that is based on attempting to solve the complete set of 
higher-order unification problems, has demonstrated a practical usefulness for this 
kind of computation. In particular, it has been revealed that there is a wide collec- 
tion of apphcation tasks in which the unification problems that need to be solved in 
fact have unique solutions. Based on a study of the usage of higher-order unification 
in these examples. Dale Miller has identified a subset of the general problem that 
is known as the Lx or the higher-order pattern class [36, 50]. The problems in this 
subset occur when existential variables in queries and universal variables in program 
clauses are used in a restricted way. Unifiability for this subset is known to be decid- 
able and it is also known that a single most general unifier can be provided in any 
of the cases where a unifier exists. An empirical study conducted by Michaylov and 
Pfenning [33] shows that even if we do not restrict the syntax of programs at the out- 
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set to ensure that unification problems outside the L\ class are not generated, 95% of 
the unification problems occurring in the computations underlying practical XProlog 
applications are first-order, and the remaining evolve into problems belonging to the 
Lx subset once substitutions determined by looking at other disagreement pairs are 
made for logic variables. 

When applied to higher-order pattern problems, Huet's procedure is guaranteed 
to terminate and will do so with a unique successful branch. However, by fully taking 
advantages of the Lx restriction, Huet's procedure can be further improved. First, 
the unique solution to a problem in the Lx subset may be found by Huet's proce- 
dure through a branching search, which is known to be expensive in performance. 
Second, Huet's procedure only partially computes the solutions for flexible-flexible 
pairs, whereas the complete solution for such pairs in the Lx subset can be found in 
a controlled way. Third, it is known that the types of logic variables have no impact 
on the structures of the unifiers of Lx problems, and consequently the maintenance 
and examination effort required by Huet's algorithm for such information becomes 
completely redundant. Improvements of this sort have already been proposed by Dale 
Miller, which lead to a simpler and more efficient approach for solving pattern unifi- 
cations. This approach is, however, described at a high level in a non-deterministic 
manner. Research conducted by Nadathur and Linnell in [42] further refines Miller's 
algorithm into one that is suitable to be used as the basis of actual implementations 
by seriously taking the efficiency of the algorithm into account. This approach is 
adopted in the implementation scheme for XProlog underlying this thesis. 

A critical part of defining higher-order pattern unification problems and the algo- 
rithm for solving them is paying attention to the scopes of quantifiers that give rise 
to logic variables and constants. The logic underlying XProlog has the capability to 
mix such scopes richly — for example, existential and universal quantifiers can be used 
in arbitrary order over goals. However, to develop the discussion in a way that leads 
naturally into an implementation of XProlog, it is useful to have available a particular 
approach to encoding and treating quantifier scopes. We include this mechanism in 
an abstract interpreter in the next section before explicitly taking on the discussion 
of higher-order pattern unification in Section 4.3. 

4.2 An Abstract Interpreter 

The model of computation presented in Section 2.2 can be refined into a state tran- 
sition system whose purpose is essentially to simplify a set of goals till they all are 
completely solved. The reason for considering a set of goals in a state as opposed to 
a single goal is that we allow for conjunctions in G formulas: to solve such a goal, we 
have to solve both goals. Another thing to note is that in the presence of augment 
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goals it is necessary also to include the available program as a component of a state. 
However, programs must parameterize the solution of particular goals and not the 
entire set: in trying to solve the goal D D G, we get to use the clause D in solving 
G but not in solving all the other goals present in the set. We would also like to 
include in the treatment a realistic model for finding substitutions for existentially 
quantified variables in goals. For this reason, we add to the state a disagreement set 
representing a unification problem that still has to be solved and a substitution 9 that 
is proposed as a solution to parts of the overall unification problem. Finally, we need 
to keep track of the constants and logic variables that have already been introduced 
in the search up to this point so as to make sure we do not reuse them. 

Universal goals will be treated in the abstract interpreter in a similar manner 
to that in the high-level description of computation: they will be instantiated with 
new constants. For existential quantifiers, we will use the idea of instantiating with 
logic variables as already indicated. However, we have to be careful to take into 
account the order in which these quantifiers are encountered for correctness. As a 
concrete example, consider an attempt to solve the goal 3yiz{p y z) given a program 
that contains the clause \/x{p x x). Following the expected approach leads to the 
disagreement set {{p Y c,p X X)} being given to the unification procedure; Y and 
X are logic variables here that have been introduced for the purpose of instantiating 
the existential quantifier in the goal and the universal quantifier in the program 
clause and c is a new constant introduced when the universal quantifier in the goal is 
processed. Now, if we proceed naively with unification, this disagreement set can be 
solved by instantiating Y and X to c. Unfortunately, this solution is incorrect because 
instantiating X with c corresponds to producing a computation sequence according to 
the high-level description in which the constant introduced for the universal quantifier 
in the goal is not new. 

The particular point that we have to pay attention to in order to avoid bad 
solutions like that discussed above is that logic variables can only be instantiated 
with terms from a signature that is in existence at the time when these variables are 
introduced. A practical way to realize this constraint in unification is to think of the 
term universe as growing in stages, with each universal quantifier introducing a new 
stage [38] . Calling each stage a universe level, we can think of labeling each constant 
with the universe level at which it enters the signature. We can then also label logic 
variables with universe levels to indicate the maximum level that can be attached to 
a constant that appears in a term instantiating the variable. 

To realize this scheme within our abstract interpreter, we shall include with each 
state a labehng function that assigns universe levels to (finite sets of) constants and 
variables associated with the state. Since the domain of this function is finite, we 
will sometimes depict it by its graph, i.e., we will show it as a set of ordered pairs. 
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We further associate with each goal the value of the universe level at the start of the 
processing of that goal; this universe level will be manipulated by embedded universal 
goals and will be used to label logic variables that are generated in processing. We 
shall also need to make sure that substitutions for logic variables are consistent with 
labeling functions. Unification must produce substitutions that respect labelings and 
they may lead to modifications to labelings needed to ensure that subsequent sub- 
stitutions will not violate the dependencies generated by earlier ones. The following 
definition introduces the notions needed to formalize these requirements: 

Definition 4.2.1. A labeling function C is a mapping from a finite collection of 
logic variables and constants to natural numbers. Let 9 = {{Xi,ti)\l < i < n} be a 
substitution, and let C be a labeling function. Then is proper with respect to C if 
for 1 < i < n it is the case that C{c) < C{Xi) for any constant c appearing in ti. The 
labeling induced by 9 and C in this case is a labeling function that is written as Cg. 
This function behaves identically to L on constants and on logic variables it is such 
that 

IIq{X) — mm({>C(Xj)|(Xj, ij) e 9 and X appears in ij}) 

if the variable is new, i.e., does not have a universe index already assigned to it and 
is 

Ce{X) = min{{C{X)} U {C{Xi)\{Xi,ti) e 9 and X appears in ti}) 
otherwise. 

Having provided the intuition behind the abstract interpreter structure, we now 
begin to present it formally. The first aspect to be made precise is the structure of a 
state within the interpreter. 

Definition 4.2.2. A computation state is a tuple of form {Q,V,C,V,C,9) where 

1. Q is a set of triples of the form {G, V, N) where G is a goal, V is a collection 
of program clauses and N is a natural number, 

2. V is a disagreement set, 

3. C and V are (finite) sets of constants and logic variables respectively, 

4. C is a labeling function whose domain is CUV, and 

5. 9 is a substitution for logic variables. 
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The syntax that we have used for program clauses in the logic underlying XProlog 
allows them to have a conjunctive structure. This is useful, for instance, in writing 
augment goals but in describing computation it is preferable to be dealing only with 
clauses of the form Vxi . . . \/XnA or \/xi . . . \/Xn{G D A) where A is an atomic formula. 
We describe a function on program clauses that allows us to extract a set of clauses 
in this reduced form from them. 

Definition 4.2.3. The elaboration of a program clause D, denoted by elab{D), is the 
set of formulas defined as the follows: 

1. If D is an atomic formula A or of the form G D A, then it is {D}. 

2. If D is Di A D2, then it is elab{Di) U elab{D2). 

3. IfD is yxDi then it is {VxDs | ^2 e elab{Di)}. 

The elaboration of a program V is the union of the elaboration of all the clauses in 
V. 

Wc now formalize the notion of state transitions that underlies our abstract in- 
terpreter for the logic underlying XProlog. 

Definition 4.2.4. A state (^2, ^^2, C2, V2, /^2j ^2) is derivable from another state of 
form {Qi, Vi,Ci,Vi, Ci, 9i) if one of the following holds. 

1. {T,V,N) e g^, g2^Qi- {{T,V,N)}, V2 = V,, C2 = Ci, V2 = Vi, L2 = A 
and 62 — 0- 

2. {GiAG2,V,N) G g,, g2 = {gi-{{GiAG2,r,N)})U{{GuV,N),{G2,V,N)}, 
T>2 = Vr, C2 = Ci, V2 = Vi, £2 = Ci and 62 = 0. 

3. (Gi V G2, r, N) e gi, for i = 1 or i = 2, 

g2 = {gi - {(Gi V G2, V, N)}) U {{Gi, V, N)}, 

V2 = Pi, C2 = Ci, V2 = Vi, C2 = £1 and 02 = 0. 

4. (3xG, V,N) G gi, for a logic variable X ^Vi, 

g, = (g, - {{3xG, V, N)}) U {{G[x := X],V, N)}, 
V2 - Vi, C2 = Ci, V2 = Vi U {X}, £2 = >Ci U {{X, N)} and 62 = 0. 
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5. {D D G,V,N) e Gi, 02 = iQi-{{D D G,V,N)})U{{Gi,VAD,N)}, V^^V^, 
C2 = Ci, V2 = Vi, C2 = Ci and 02 = 0. 

6. {\/xG, V, N) e Qi, for a constant c ^ Ci, 

02 = {01 - {{yxG, r, N)}) u {{G[x ■.= c],r,N + 1)}, 

V2 = Vi, C2 = Ci U {c}, V2 = Vi, C2 = jCiU{{c,N+ 1)} and 62 = 0. 

7. Let {A, V, N) e 0i, let Vxi . . . Va;„A' G elab{V) and, for 1 < i < n, let Xi be a 
distinct logic variable such that Xj ^ Vi. Further, assume 

V'^V.U {{A, A'[xi := Xi] . . . [xn := and 
C'^CiU{{X,,N), ...,(X„,7V)}. 

Suppose that a unification procedure applied to V produces a substitution a 
that is proper with respect to C , and a disagreement set V . Then 02 = <j{0i — 
{{A,V,N)}), V2 = V", C2 = Ci, V2 = ViU{Xi, . . . ,X„}, 02 = a and C2 = 

8. Let {A,V, N) G 0i, let Vxi . . . Va;„(G' D A') G elab{V) and, for 1 < i < n, let 
Xi be a distinct logic variable such that X^ ^Vi. Further, assume 

V'^V.U {{A, (G D A')[x, := X^]... [x^ := and 
jr.'^CiU{{X,,N), ...,(X„,7V)}. 

Suppose that a unification procedure applied to V produces a substitution a that 
is proper with respect to C , and a disagreement set V". Then 

02 = a{{0i - {{A, V, N)}) U {{G[xi := X,] . . . [x„ := 7^, N)}), 

T>2 = V", C2 = Ci, V2 = Vi U {Xi, ...,Xn},92^a and C2 = C- 

A sequence of the form (^1, Vi, £1, 6*1), PnjCn, Vn, is a deriva- 

tion sequence if the {i-\-l)th tuple in it is derived from the ith tuple. Such a derivation 
sequence terminates if no tuple can be derived from {Om'DnjCmVn, J0.n,dn) ■ 

Definition 4.2.5. Let G be a closed goal formula, let V be a set of closed program 
clauses and let C be the set of constants occurring in G and V. Further, let L be 
a labeling function of form {(c, 0)|c G C}. l^ow assume 0\ — {(G, T', 0)}, T>\ — %, 
C\ — C, V\ = ^, L\ = L and = %. Then a derivation sequence of the form 
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{Qi,Vi,Ci,Vi,Ci,9i), {Qn,T^n,Cn,Vn, Cn,6n) , ■ ■ ■ is Said to he a V -derivation se- 
quence for G. Such a sequence may terminate because no further rules are applicable 
to the last tuple in it. If such termination occurs at the mth tuple because Qm is empty 
and is either empty or contains only flexible-flexible pairs, then the sequence is 
called a V -derivation of G. A sequence of this kind embodies a solution to the query 
G in the context of the program V and the answer substitution corresponding to it 
is obtained by composing 6*^ o . . . o 6'i with any unifier for Dm and restricting the 
results substitutions to the logic variables corresponding to the top-level existentially 
quantified variables in G. 

An abstract interpreter for our language can be described as one that searches 
for a T'-derivation of G for any closed goal G and closed program V. The soundness 
and completeness of such an interpreter with respect to the high-level description of 
computation in Chapter 2 is demonstrated in [38]. Notice that our abstract interpreter 
still has elements of non-determinism in it. In particular, it has to select the next goal 
to try from the collection of goals in the state, it has to make a choice between the two 
disjuncts when solving a disjunctive goal and it also needs to pick the program clause 
to try from the elaboration of the program when it reaches an atomic goal. These 
issues are present in the setting of a first-order logic language as well and similar 
solutions can be used in our context. In particular, we impose a left to right order on 
the goal set and use this order to determine the next goal to act upon, we use a left- 
to-right processing order in the treatment of disjunctive goals and we select clauses in 
solving atomic goals based on the order of their presentations in the program. There 
is no need to reconsider the order in which we select goals from the goal set. In all 
other cases we use a depth-first approach with the possibility of backtracking when 
faced with alternatives. 

Definition 4.2.5 requires that the final disagreement set consist of only fiexible- 
flexible pairs. The ability to produce a set satisfying this requirement depends on the 
unification procedure that is used. The unification algorithm that we discuss next 
is guaranteed to produce an empty disagreement set when the unification problems 
that have to be solved all fall within the higher-order pattern fragment. However, we 
shall sometimes apply this procedure to cases where this restriction is not satisfied. 
In this case, it is possible that the final disagreement set is not empty and contains 
at least one rigid-fiexible pair. In this case the original goal is to be understood to 
be solvable provided the final disagreement set has a solution. 

4.3 Higher- Order Pattern Unification 

The implementation scheme underlying this thesis specializes the abstract interpreter 
that we have described by using a unification algorithm that completely solves higher- 
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order pattern unification problems. We say that a unification problem, given by a 
disagreement set, is in this class if the following syntactic constraint is satisfied by 
every term in the set: for any subterm of the term that has the the form {X ti ... t„) 
where X is a logic variable, it must be the case that ti, tn are distinct constants or 
de Bruijn indexes and, further, if they are constants then they must have originated 
from the processing of (essential) universal quantifiers appearing inside the scope of 
the quantifier whose processing gave rise to X. Given the labeling function discussed 
in the previous section, the latter condition can be stated also in the following way: 
if ti is a constant then it must be the case that C{X) < C{c), where C is the labehng 
function associated with the state in which the unification problem is encountered. 
As a concrete example, consider the disagreement set {{{X C2), (ci C2))}, where X is a 
logic variable and ci and C2 arc constants. If the labeling function associated with the 
state is {(X, 1), (ci, 1), (c2, 2)} then this disagreement set constitutes a higher-order 
pattern unification problem. However, it is not a higher-order pattern unification 
problem if the labeling function is {(X, 2), (ci, 1), (c2, 2)} instead. It is not too difficult 
to see that with scopes corresponding to the second labeling function the problem 
has two solutions: {(X, A(ci #1))} and {(X, A(ci C2))}. The scoping corresponding 
to the first labehng function rules out the second of these unifiers. More generally, 
it has been observed that unification problems that are in the higher-order pattern 
class have unique most general solutions whenever they are solvable [36]. 

The unification procedure that we will use has two phases, one for term simpli- 
fication and another for variable binding. In the first phase, rigid-rigid pairs are 
handled in a way similar to that in Huet's unification algorithm by matching the 
heads of terms and progressing into subproblems formed by the arguments pairwise 
when the head match succeeds. In the binding phase, rigid-fiexible (symmetrically, 
fiexible-rigid) and fiexible-fiexible pairs are examined and a substitution is generated 
for the variable head(s) only if the fiexible term(s) satisfy the higher-order pattern 
restriction. The transformation of the terms to their head normal forms is assumed 
implicitly prior to the application of either of these phases. We also assume a slight 
modification of the term representation that collapses a sequence of abstractions into 
a consolidated form: in particular the term (A ... A t) with a binder length n in our 
previous discussions is now represented as (A(n, t)). By an abuse of notation, we 
shall allow the binder length to be equal to 0, viewing (A (0, t)) as identical to t. 

The binding phase of our algorithm utilizes some optimizations over Huet's pro- 
cedure that become possible when we restrict attention to the higher-order pattern 
case. To understand one of these optimizations, consider a rigid-flexible disagreement 
pair of form 

{{X ai ... a„), (r si ... s^)), 
where X is a logic variable and r is a constant or de Bruijn index and assume that 
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the higher-order pattern requirements are satisfied. We do not show binders at the 
heads of the terms in a disagreement pair here or below because these can be made 
identical and, under the de Bruijn representation, they can then be ignored. Now, 
a solution to this pair must rely on a substitution for X. Suppose that the term so 
substituted has the structure A (n, r' ti ... If r is a de Bruijn index or a constant 
c such that C{X) < £(c) where C is the relevant labeling function, r cannot appear 
directly in a substitution for X that is proper with respect to C Consequently, the 
only way the pair can be solved is if r appears in the list of arguments for X, i.e., in 
Qi ... Qn and, in this case we would need to substitute for X a term that projects 
onto the corresponding argument. The other possibility is for r to be a constant c 
such that C{c) < C{X). In this case, c cannot occur in the list ai, a„, and for this 
reason, r' would have to be identical to c. These observations allow us to uniquely 
determine the head of the substitution to be generated and to thereby avoid any of 
the branching that would be manifest in an application of Huet's algorithm that is 
blind to the situation being considered. 

Another place where an optimization is possible is in the treatment of fiexible- 
flexible pairs. Huet's algorithm does not treat such pairs at all, as we have noted 
earlier. However, if the higher-order pattern restriction is adhered to then it is possible 
to solve such pairs in a most general way. For example suppose that the pair under 
consideration is of form 

{{X ai ... a„), {Y bi ... bm)), 

where X and Y are distinct logic variables. Let us first assume that the quantifiers 
from which X and Y result have (effectively) the same scopes, i.e., that JC{X) = 
C{Y). Then it can be seen that a most general solution to this pair can be given by 
substitutions for X and Y of the form 

{X,X{n, {H h ... h))) and (Y, X {m, {H s, . . . Sk))) 

where if is a new logic variable with the same scope as that of X and Y and ti, . . . ,tk 
and Si,. . . ,Sk are de Bruijn indices for variables bound by the abstractions in the 
binder of the substitution terms. The purpose of the arguments in the two substitu- 
tions is to preserve parts of the arguments in the terms in the disagreement pairs that 
cannot be absorbed into any subsequent substitutions for H. Of course, when this 
substitution is applied to the terms that are to be unified, it should produce identical 
terms. From this, it is easy to see that ti,. . . ,t^. and si, . . . ,Sk should be such that 
they both generate the same permutation zi, . . . ,Zk of the common elements of the 
argument lists oi . . . a„ and foi . . . 6^, of the terms in the disagreement pair. 

The notation introduced in the following definition is useful in making the sub- 
stitutions described above precise. 
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Definition 4.3.1. Let \ ] be a non-empty list of distinct constants or de 

Bruijn indexes, and let z be a constant or de Bruijn index occurring in [oi ... a„] . 
Then z I [ai, . . . , a„] denotes the de Bruijn index #(n + i — 1) where i is such that z — 
Oj. Suppose that [oi ... a„] and [^i, . . . , z^] are two lists of distinct de Bruijn indices 
or constants such that {zi, . . . , z^} C {ai,...,a„}; then [zi,...,Zk] I [ai,...,a„] 
denotes the list [ii, . . . , ik] such that for 1 < j < k, ij = zj [ [ Oi , . . . , Oji ]. We include 
the case where k = in this definition by deeming the result to be the empty list. 

Using the selection operator, we can define a most general unifier for the pair of 
(higher-order pattern) terms ((X ai . . . a„), {Y hi ... bm)) where X and Y are logic 
variables such that JC{X) = C(Y) as 

{{X,X{n,{Hti ... tk))),{yAim,{Hsi ... Sk)))}, 

where 

1. if is a new logic variable, 

2. [zi, . . . ,Zk\ is some listing of the elements of {ai, . . . , a^} fl {bi, . . . , bm}, and 

3. [ti, . . . , tk] = [ai, an] i[zi, Zk] and [si, Sk] = [bi, ... ,bn\ i[zi, ... , Zk]. 
As a concrete example, suppose the terms to be unified are 

{X C4 ci C2 C3) and {Y C5 C2 Ci C3), 

where X and Y are logic variables such that C{X) = C{Y) = 0, and q's are constants 
where = i, for 1 < i < 5. This pair has a most general unifier 

{(X,A(4, (ii #3#2#1))),(F,A(4, (i/ #2 #3 #1)))}. 

The listing of the common argument elements in the terms to be unified that produces 
the sequence of argument elements in the substitution terms is [c^, C2, C3]. 

Of course, the labels on the fiexible heads of the terms that are to be unified need 
not be the same. Let us assume, without losing generality, that C{X) < C{Y). We 
can describe a most general unifier in this case as well. This solution can be arrived 
at in two steps. The first step, that is called raising, adjusts the head of the second 
term so that its scope is made identical to that of X. At this point, a unifier can be 
generated as in the case already considered. The main issue with the label of Y being 
larger than that of X is that some of the constants that appear as arguments in the 
first term can appear in the substitution term for Y. These constants are the ones 
that have a label that is less than or equal to that of Y. We introduce the following 
notation to identify them collectively: 
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Definition 4.3.2. Given a list of distinct constants and de Bruijn indexes [ 0,1, ... , Ojil , 
a labeling function C and a logic variable Y , let {ci, . . . ,Cfe} be the set of constants 
in [ai,...,an] whose labels are less than or equal to Ciy). Then the expression 
[oi, . . . , a„] i]- Y denotes some listing of {ci, . . . , c^}. Note that the set of constants 
satisfying the condition may he empty in which case [oi, . . . , a„] ft" y is an empty list. 

The raising substitution is identified in this context to be {(y, Y' ci ... Cfe)} where 
Y' is a new logic variable that is assigned the same label as X and [ci, . . . , c^] = 
[ai, . . . ,a„] t y. 

To complete our consideration of the flexible-flexible case, we need also to deal 
with the situation where the heads of the two terms are identical, ie., where the pair 
in question is ((X 04 ... a„), (X hi ... This differs from the earlier case in 

that the same substitution gets applied to both terms. From this it follows easily 
that a most general solution is one that preserves exactly the common elements of 
[ai . . . , an] and [61, ... , 6^] that also appear in identical positions in the two lists. 

The above discussion provides an overview of the higher-order pattern unification 
algorithm that is used as the basis of the implementation scheme developed in this 
thesis. The actual algorithm we use is the one developed by Nadathur and Linnell [42]. 
This algorithm uses the fact that the partial substitutions described above are actually 
most general to generate a complete solution for a flexible-rigid pair in one recursive 
pass over the rigid term. (The flexible-flexible case is completely treated already by 
the substitution discussed.) As is to be anticipated, this algorithm has two phases, 
one for term simplification and the other for binding. The simplification phase is 
characterized by the rules in Figure 4.2. In the application of these rules, a unification 
problem is assumed to be given by a tuple (P, 6) where V is the disagreement set 
under consideration and ^ is a set of substitutions which is initially empty. Further, 
a labeling function C is assumed to be available during the entire unification process 
as an implicit global component of the state. The binding phase is realized through 
the function mksubst that takes as its arguments the head of (one of the) fiexible 
term(s), the arguments of this term and the other term in the disagreement pair. 
The definition of this function together with those of other two auxiliary ones hnd 
and foldhnd are given by the rules in Figures 4.3, 4.4 and 4.5. 

The pattern unification proccdiuc terminates when none of the transformation 
rules can be applied to the disagreement set that has been produced. If this is because 
the disagreement set is empty, then a most general unifier has been computed for the 
original problem. On the other hand, if the disagreement set is not empty then non- 
unifiability can be concluded in a context where all disagreement pairs adhere to the 
higher-order pattern restriction. Such failures are characterized concretely by the 
following situations: 

1. there are rigid-rigid pairs left of the form {r ti ... r' Si ... Sm) where r ^ r'. 
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(1) ((A(n,t),A(n,s)) :: V.O) — > {{t,s) :: V,9), provided n > 0. 

(2) ((A (n, t), A (m, s)) ■■V,9)^ {{t, A (m - n, s)) :: 9), 
provided n > and m > n. 

(3) {{{rh ... tn),X{m,s)) ■.:V,9) 

((((|r,0,m,mZl |ii,0,m,m/l ... |t„,0,m,TO/l) #m . . . #1) , s) V , 9) , 
provided r is a constant or a de Bruijn index and m > 0. 

(4) (((r ti . . . t,), (r . . . Sn)) ::V,9) ^ ((ti, Si) :: 
provided r is a constant or a de Bruijn index. 

(5) {{{Xai ... an),t) ■■V,9) {a{V),ao9), 

provided X is a logic variable, {X ai ... a„) is Lx with respect to £, 
and mksubst{X, t,[ai, . . . , a^]) — o". 



Figure 4.2: Term simplification in higher-order pattern unification. 



mksubst{X, X{k,X bi ... bm), [di, • • • , On]) — ^ {{^y \ {k + n, H wi ... w;))}, 
where if is a new logic variable and JC — CU {{H, C{X))}, provided 

(1) {X bi ... bm) is Lx with respect to C and 

(2) for 1 < i < n + k, Wi = #(n + k — i), if a/[i] = bi where 
al = [|ai,0, k,nil}, [a„,0, k,nilj,^k, #1]. 

mksubst{X, t, [ai, . . . , a„]) — >• {[X := A (n, s)]} o 6*, 
if the head of t is not X and bnd{X, t, [ai ... a„], 0) — ^ (6', s) 



Figure 4.3: Top-level control for calculating variable bindings 
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bnd{X,\{m,t),[ai, . . . ,an],l) — > {6,\{m,s)), 

if m > and bnd{X, t,[ai, . . . , an], I + m) — ^ {6, s). 
bnd{X, r ti ... tm, [ai, . . . , a„], — > {9,r' si ... Sm) 

provided foldbnd{X, [ti, . . . ,tm],[ai, . . . , an], I, (0, [])) {9,[sm, ■ ■ ■ , si]) and 

(1) r' = r, if r is a constant such that £.{r) < JC.{X), or 

(2) r' = r 1 aZ, if r is a de Bruijn index occurring in al where 
al = [[ai,0,/,m/],...,[a„,0,^,m/], #/,...,#!]. 

bnd{X,Y hi ... 6^, [ai, • • • , On], — ^ 

{{{Y,X{m,H d ... Ck ui ... Uq))},Hwi ... Wp Vl ... Vq), 
where H \s a. new logic variable and C = C\J {{H, C{X))}, 

[ci, . . . , Cfc] = ali\ Y, [wi, . . .Wp] = [ci, . . . ,Ck] i al, [ui, . . . ,Uq] = zl I [bi, . . . ,bm] and 
[vi, . . . ,Vq] = zl I al, with al = [{ai, 0, 1, nil}, . . . , {an, 0, nil}, #1, . . . , #1] and 
zl = [zi, . . . , Zq] as a permutation of 

{[ai, 0, nil},. . . , {an, 0, /, nilj,#{l - 1), . . . ,#1} n {bi, . . . , 6„}, 
provided X and Y are distinct logic variables such that C{X) < C{Y), 
and Y bi ... bm '^s Lx with respect to C 
bnd{X, Y bi ... bm, [ai, . . . , a„], Z) — ^ 

{{{Y,X{m,H Wl ... WpVl ... Vq))},H Cl ... Ck Ul ... Uq), 

where H is a new logic variable and C = C U {{H, C{X))}, 

[ci, . . . ,Ck] = bl il X, [wi, ...Wp] = [ci, . . . ,Cfc] i bl, [vi, . . . ,Vq] = zl Ibl and 

[ui,...,Uq] = zl i [|ai,0,/,mZl,...,|an,0,/,mZ], #/,...,#!] with 

bl = [bl, . . . , bm] and zl = [zi, . . . , Zq] as a permutation of 

{[ai, 0, /, nil},. . . , {an, 0, /, nil},#{l - 1), . . . , #1} n . . . , 

provided X and Y are distinct logic variables such that C{Y) < C{X), 

and Y bl ... 6^ is with respect to C 

Figure 4.4: Calculating variable bindings. 



foldbnd{X, [], al, I, {9, si)) — > {9, si). 
foldbnd{X, [tl, . . . , tn],al, I, {9, si)) 

— > foldbnd{X, [(T{t2), ■ ■ ■ , al, I, {a o 9,s :: si)), 

provided n > and bnd(X, ti, al, l) — ^ {a, s). 



Figure 4.5: Iterating the variable binding calculation over an argument list 
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2. the attempt to apply a bnd rule encounters a tuple of the form 

{X,r ti ... tm, [ai ... an], I) 

where r is a de Bruijn index or a constant with £(r) > J0,{X) that does not 
occur in [|ai, 0, mi], . . . , |a„, 0, m/], . . .,#!]. 

3. the attempt to apply a bnd rule encounters a tuple of the form 

{X,X hi ... bm,[ai,...,an],l). 

In analogy with first-order unification, the first of these failures corresponds to a clash 
of constants and the latter two constitute failure because of an "occurs-check." 

As we have already noted, our implementation of XProlog allow for a more liberal 
syntax that could lead to disagreement pairs that do not satisfy the higher-order 
pattern restriction. Given this, a non-empty disagreement set may also be a signal of 
the fact that the unification process should be suspended till further variable bindings 
have been determined in some other way. The actual realization of the unification 
procedure should therefore be on the lookout for errant disagreement pairs and should 
defer the processing of these to a later point; the structure of the abstract interpreter 
already accommodates such a possibility. 

The last issue to be mentioned in this section is the usage of types in the pattern 
unification procedure. Unlike Huet's algorithm, types are not needed during the 
binding phase of unification, but have a relevance to the applicability of rule (4) in 
term simplification. In particular, two constants with the same name are viewed as 
being equal only when they have equal types and this forces the terms in this situation 
to have the same number of arguments. Based on the type system of our language, 
the determination of the equality of types is carried out by first-order unification, 
which should interleave with the apphcation of rule (4) . The details of the treatment 
of types relative to pattern unification are discussed in Chapter 7. 
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Machine-Level Term Representation 

The discussions in the previous chapters have gradually refined the representation of 
A-terms from a conceptual form into one more suitable to be used as a basis of im- 
plementation. Two issues still remain to be dealt with in a concrete implementation. 
First, we need an actual procedure for converting terms to head normal form. Sec- 
ond, we still have to discuss the reflection of the suspension calculus into lower-level 
machine structures. Wc take these issues up in this chapter. In Section 5.1 wc discuss 
different strategies for producing a head normal form for terms in the suspension cal- 
culus, leading eventually to one that has been shown empirically to have good time 
and space characteristics. Section 5.2 then describes the low-level encoding of A-terms 
used in our implementation and it also discusses the pragmatic issues underlying our 
choices. 

5.1 Implementation of Head Normalization 

An efficient implementation of the normalization of terms is clearly important to the 
performance of an overall system realizing the XProlog language. The suspension 
calculus serves as a suitable basis for such an implementation by providing a control 
over the substitution operation and, hence, a flexibility in the ordering of the steps 
involved in reduction. A high-level, non-deterministic description of the process of 
reducing a term to one of its head normal forms has also been identifled in Chapter 3; 
this is a process in which a head normal form is produced by repeatedly rewriting a 
head redex. Once a head normal form has been produced, there is still some flexibility 
with regard to how to treatment the arguments of the term. For example, consider 
the term 

(A(n, |(c ti . . . tm),ol,nl,e})), 

where c is a constant and ti, are arbitrary terms. The applications of rule (r5) 

and (rl) in Figure 3.2 results in the structure 

{X{n,{clti,ol,nl,el ... o/, n/, e]))), 

that is a head normal form. At this point, there are choices in what to do with 
the arguments, whether to leave them as suspensions, or to transform them into de 
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Bmijn terms or perhaps even to reduce them too to normal forms. Different reduction 
strategies can be characterized in terms of the choices that they make at this stage. 

One reduction strategy that can be considered is that which uses the suspension 
calculus only as an implementation device, keeping explicit representations only of 
terms in de Bruijn form. Within this strategy, the old and new embedding levels 
and the environment in a suspension would be reflected in the parameters of the 
reduction procedure but not in terms. Consequently, the substitutions remaining 
on the arguments of the head normal form shown above would have to be carried 
out eagerly, possibly combined with additional /3-reductions applied to these terms. 
In this strategy, the rewriting steps shown in Figure 3.2 and Figure 4.1 would be 
carried out implicitly and hence would not themselves give rise to intermediate terms. 
An alternative strategy would be one that dispenses with the recursive structure of 
the first reduction procedure by actually explicitly creating the righthand sides of 
each of the rewrite rules and by using a stack to provide any additional control. 
Such a procedure would have to be complemented by an explicit representation of 
suspensions and hence it could also potentially leave the arguments of a head normal 
form as suspensions. 

A drawback with the second approach is that it requires new terms to be explicitly 
created as the result of each rewriting step, even if the terms only serve as intermediate 
results of the head normalization process. For instance, consider the original term in 
the previous example. As the result of the applications of the rule (r5) in Figure 3.2, 
this approach requires the explicit creation of the structure 

(A(n, ([c,o/,n/,e] |ti,o/,n/,e] ... |t„„ o/, n/, el))), 

only to see the head |c, o/,n/,e] being rewritten by the immediately following step 
through an application of rule (rl). As another example, it is possible for the term 
t in the suspension [t, o/,n/,e] to be a /3-redex, in which case new suspensions will 
be created through the use of rules (r5) and (r6) only to be discarded when the rule 
(Pg) is applied. 

The redundancy in the creation of the intermediate terms is avoided by the first 
strategy. However, the eager performance of the substitutions over the arguments of 
head normal forms leads to a traversal of these arguments, which may turn out to 
be redundant in a context where term comparison can be interleaved with reduction 
steps; just exposing the heads may suffice to show non-unifiability. Performing just 
substitutions also misses out on the sharing of walks between different reduction steps. 
For example, consider the pair of terms 

((c l(Aii) i2,l,0,(c,0)::mi]),(c ^3)), 

where c is a constant and ti, t2 and arc arbitrary terms. The application of the 
term simplification rule (4) in Figure 4.2 results a new pair of terms formed by the 
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arguments of the original terms which need to be head normahzed immediately. If 
the transformation of |(Ati) t2, 0, 1, (c, 0) :: nil} into a de Bruijn term is carried out 
eagerly at the end of the previous invocation of head normalization, as being required 
by the eager substitution strategy, a separate traversal has to be carried out over the 
structure of ti when the redex (Ati) ti is rewritten. Of course, we could also reduce 
such redexes when calculating out suspension terms. However, this corresponds to 
always producing /3-normal forms, something that is costly especially in a setting 
where failure can be registered by looking at only parts of terms. 

The above discussion of the characteristics of the two strategies that we have 
considered suggests an intermediate version that combines the benefits of both: the 
normalization procedure can use suspensions implicitly, embedding their components 
in its arguments rather than in explicitly constructed suspensions but, in the end 
leaving the arguments in the head normal forms it finds in the form of suspensions. 
Studies have been conducted in [31] using the Teyjus Version 1 implementation of 
XProlog to understand the performance differences between the combination reduc- 
tion strategy just described and the first strategy considered which evaluates sub- 
stitutions eagerly on the arguments of head normal forms. These studies indicate a 
significant performance benefit to the combination strategy: specifically, an average 
of 32% reduction in execution time and 81% reduction in memory usage was observed 
over a set of practical L^-style programs with this strategy. We have accordingly cho- 
sen to use this combination strategy in the new implementation of XProlog. In the 
rest of this section, we elaborate on the structure of the reduction procedure used in 
the implementation. To keep this description brief and understandable, we present 
this structure through SML style pseudo-code. 

The first task in presenting the procedure is to provide datatype declarations for 
the terms in the suspension calculus. These declarations are contained by Figure 5.1. 
As in usual implementations, a graph-based representation is assumed for terms. 
SML expressions of types rawterm and term can be viewed as directed graph, which 
are assumed to be acyclic during the reduction process. It in fact can be observed 
from the head normalization procedure discussed subsequently that if the input to 
the procedure has this property, it is preserved in the normalization process. 

Terms of the suspension calculus are reahzed as references to appropriate SML 
expressions of the type rawterm. The environments and environment items in this 
calculus are presented as expressions of types env and eitem. An expression of form 
dum(l) is used to encode environment item @/, whereas hndg(t, I) corresponds to 
(t,/). Value constructors fv and dh are used to encode logic variables and de Bruijn 
indexes respectively. The encoding of abstractions, applications and suspensions is 
achieved by supplying constructors lam, app and susp to arguments of proper types. 
The constructor ptr serves to aid the sharing of reduction results which means that 
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datatype rawterm = const of string 
I Iv of string 
I db of int 

I ptr of (rawterm ref) 
I lam of (rawterm ref) 
I app of (rawterm ref) * (rawterm ref) 
I susp of (rawterm ref )*int*int*(eitem list) 
and 6 it em = dum of int 

I bndg of (rawterm ref) * int 

type env = (eitem list) 
type term = (rawterm ref) 

Figure 5.1: A SML encoding of suspension terms in head normalization. 

at certain points in our reduction process, we want to identify (the representations 
of) terms in a way that makes the subsequent rewriting of one of them correspond to 
the rewriting of the others. Such an identification is usually realized by representing 
both expressions as pointers to a common location whose contents can be changed 
to effect shared rewritings. In SML it is possible to update only references and so 
the common location itself must be a pointer. The constructor ptr is used to encode 
indirections of this kind when they are needed. Complementing this encoding, we use 
the following functions to, respectively, dereference a term and assign a new value to 
a given term. 

fun deref(term as ref (ptr (t))) = deref(t) 

I deref(term) = term 
fun assign(tl ,ref (ptr(t) ) ) = assign(tl,t) 

I assign(tl,t2) = tl := ptr(t2) 

In addition, we use the following function to help with looking for a value in an 
environment during the reduction process. 

fun nth (x:: 1,1) = x 

I nth(x::l,n) = nth(l,n-l) 

Based on the given SML encoding of the terms suspension calculus, the main 
work of the head normalization procedure can be defined as that in Figure 5.2. The 
first four arguments are used to represent a (possibly trivial) suspension implicitly. 
The fifth argument of boolean type is used to control that the rewriting of head 
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redexes is performed in a left-most and outer-most order: it is set to true when the 
term under reduction has been found as the function part of an apphcation at the 
outside, and the normahzation process stops rewriting the redexes contained by it 
once an abstraction structure is revealed, so that the outer redex can be rewritten 
first. (There is in fact one exception to the outer-most order of rewriting in the 
presence of nested suspensions, which will be explained shortly.) The application of 
the Ps and rules in Figure 3.2 is carried out in the application case of hnorm. 
Further, when the head of a head normal form is exposed and the head normal form 
still has an application structure, the implicitly recorded non-trivial suspensions over 
the arguments are explicitly reflected into the term structure. The value returned by 
hnorm is a quadruple that can be interpreted as an implicit suspension. In reality, 
this suspension is a trivial one in all cases other than when the call to hnorm has its 
fifth argument being set to true, and the term component in the resulting suspension 
is an abstraction. 

When the call to hnorm on the inner suspension has its fifth argument being set to 
true, it is possible that the returned value of the call is a non-trivial suspension with 
its term component being an abstraction. This suspension should be made explicit, 
and further, it should be transformed into an abstraction using the reading rule (r6) 
in Figure 3.2 before computation can proceed. The described behavior is carried out 
by the suspension case of the procedure hnorm. The effect of making the suspension 
returned by the rewriting of the inner suspension explicit after applying rule (r6) is 
accomplished by an invocation of the auxiliary function mk-explicit defined as the 
following. 

fun mk_explicit (t , 0, 0, nil) = t 

I mk_explicit (ref (lam(t) ) , ol, nl, e) = 

ref (lam(ref (susp(t, ol+l, nl+1, duin(nl) : :e)))) 

Any given term t may be transformed into a head normal form by invoking the 
interfacing procedure head.norm that is defined as follows: 

fun head_norm(t) = hnormCt, 0, 0, nil, false). 

The correctness of head^norm is the content of the following theorem, whose proof 
can be found in [61]. 

Theorem 5.1.1. Let t' be a reference to the representation of a suspension term t 
that translates via the reading rules (rl)-(r8) in Figure 3.2 and (r9) in Figure 4-1 to a 
de Bruijn term with a head normal form. Then head_norm(t') terminates and, when 
it does, t' is a reference to the representation of a head normal form of the original 
term in the suspension calculus. 
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fun hnormCterm as ref (db(i)) ,0,0, [] ,_) = (term.O.O, [] ) 
I hnormCterm as ref (db(i)) ,ol,nl,e,whnf ) = 

if (i > ol) then (ref (db(i+ol-nl) ) , , ,nil) 
else (fn dum(l) =>(ref (db(nl-l)) ,0,0,nil) 

I bndg(t,l)=>(fn ref (susp(t2, o ,n, e) ) => hnorm(t2,o,n+nl-l,e,whnf ) 

I t => hnorm(t,0,nl-l, [] ,w)) (deref(t))) (nth(env,i)) 
I hnormCterm as ref (lam(t)) ,ol,nl,e, true) = Cterm,ol,nl,env) 
I hnormCterm as ref (lamCt)) ,ol,nl,e, false) = 

let val Ct' ,ol' ,nl' ,e')=if (ol=0) andalso Cnl=0) then hnorm(t,0,0, [] .false) 

else hnormCt , ol+l ,nl+l .dumCnl) : :e, false) 
in CrefdamCtO), ol', nl ' , e') end 
I hnormCterm as ref (appCtl,t2)) ,ol,nl,e,whnf ) = 

let val Cf ,fol,fnl,fe) = hnormCtl,ol,nl,e,true) 
in Cfn ref (lamCt))=> 

let val t2' = if CCol=0) andalso (nl=0)) then t2 
else ref CsuspCt2, ol.nl ,env) ) 
val Ct' ,ol' ,nl' ,e') = hnormCt,fol+l,fnl,bndgCt2' ,fnl) : :fe,whnf ) 
in CCif ColoO) orelse CnloO) orelse Col'oQ) orelse Cnl'oO) then () 

else assignCterm, t'))); s end 
I t => if CCol = 0) andalso Cnl = 0)) 

then CassignCterm, refCappCf, f2))); (term, 0,0, nil)) 
else Cref (app(f ,ref (susp(t2 ,ol ,nl, e) ) ) ) ,0,0,nil) ) (deref f) end 
I hnormCterm as ref CsuspCt,ol,nl,e)) ,ol' ,nl' ,e' ,whnf ) = 

let val s = mk_explicitChnorm(t,ol,nl,env,whnf ) ,ol' ,nl' ,e') 
in CassignCterm, s) ; 

if Col'=0) andalso Cnl'=0) then s 
else hnormCterm, ol ' ,nl' ,env')) end 
I hnormCref Cptr Ct) ) , ol,nl,env,whnf ) = hnormCderef Ct) ,ol,nl,env,whnf) 
I hnormCterm, _,_,_,_) = Cterm,0,0,nil) 

Figure 5.2: An environment based head normalization procedure with lazy substitu- 
tions. 
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5.2 Representation of Terms 

We discuss now the scheme for encoding terms that will become the basis for their 
manipulation in the abstract machine for XProlog. 

The most natural encoding for a term is one that uses a memory unit with a tag 
indicating the syntactic category of the term with additional parts for any other com- 
ponents. These additional components vary according to the specific kinds of term. 
For a de Bruijn index, all that is needed is a positive number for the index itself. As 
discussed in Chapter 4, a labehng function associating constants and logic variables 
with their universe levels is essential to the unification operation. This information is 
then succinctly maintained by recording numeric tags of non-negative integer values 
along with constants and logic variables. In addition to such a label, a reference 
should also be maintained with a constant to its descriptor. An (un-instantiated) 
logic variable should serve as a place holder occupying enough space so that the in- 
stantiation can be realized by destructively changing the cell to other sort of terms. 
The content of this cell is not important except for its tag and label. A suspension 
term |t, o/,n/,e] requires the maintenance of its two embedding levels ol and nl, a 
reference to its term component t and a reference to its environment e which can be 
represented as a list. An abstraction cell contains a positive number corresponding 
to the binder length and a reference to its body. In this way, nested abstraction 
structures can be denoted by a single term. An alternative in this encoding is to 
require each abstraction to be represented separately. However, considering the term 
decomposition requests issued by the pattern unification algorithm described in Sec- 
tion 4.3, it is apparent a faster access to the subcomponents of nested abstractions 
can be supported by the encoding we have chosen. 

The encoding of application terms requires more careful consideration. Applica- 
tions in a higher-order setting arc best thought of in a curried fashion, thus making 
their components (references to) their function and argument parts, respectively. 
However, a curried rendition of applications leads to a high cost in the most common 
form of access to terms needed by unification: the access to the head of a head nor- 
mal form with n arguments requires working through n applications starting from 
the outermost one. In addition, it can also be observed that the pattern unification 
algorithm discussed in Section 4.3 is best supported if the arguments of a fiexible 
term in head normal form are available as a vector. The ability to immediately ac- 
cess the heads and argument vector of an application is also useful when we consider 
the compilation of unification. If a curried representation is used, runtime effort has 
to be paid to traverse nested applications for the purpose of exposing their structure 
in this form before the rest of the computation can proceed. 

A concrete encoding of an application that is reminiscent of their treatment in 
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conventional logic language implementations is to use a structure containing three 
components: a function part, (a reference to) a vector of arguments and an arity 
corresponding to the size of the vector. Such a representation has especially nice 
properties in our setting when the program at hand is a first-order one. In this case 
the head normal form of the term is already available at compilation time. With the 
described representation, the head and the argument vector information can be sim- 
ply obtained from the top-level term structure, which also lets it be determined that 
reduction is not necessary. These benefits appear to be important since efficiency in 
realizing first-order style computations is of special importance to the overall perfor- 
mance in practical \Prolog applications [33]. Our low-level representation accordingly 
adopts such an encoding for applications. In the first-order context, term structures 
can be modified only via bindings for (first-order) logic variables which cannot ap- 
pear in the function position in an application. Thus, applications themselves have 
an unchanging structure. Taking advantage of this fact, the first-order representation 
of an application usually folds the function part and the argument vector into one 
contiguous sequence of terms. This optimization can, however, not be used in our 
setting where the heads of applications can also sometimes change. For this reason, 
references are maintained in an application referring to the function part of it and its 
argument vector respectively. 

The only remaining category of terms provided for in our representation is that 

of references. References arc necessitated by the fact that we use a graph-based real- 
ization of reduction to foster sharing, as should be clear from the SML rendition of 
the procedure that we have provided in the previous section. Thus, the destructive 
update of the term (Ati) t2 can be effected by changing the application cell represent- 
ing this term into a reference to the representation of the term 1, 0, (t2, 0) :: nil]. 
Notice that a reference has the smallest amount of data amongst all the terms — its 
encoding needs just a category tag and a pointer to another term — and so it can 
be used conveniently in such destructive updates. Another use for a reference is in 
recording the binding of a logic variable. For example, the binding of a logic variable 
X to term t can be registered by changing the cell for X into a reference to the the 
representation of t. 

As we have noted in Section 4.3, types may sometimes be needed at run-time for 
the purpose of determining the identity of constants. Consequently, we need to have 
an explicit encoding for them as well in the representation of constants. In particular, 
a constant cell gets an extra component in the form of a reference to its type. As for 
the encoding of types themselves, since the computation on them is solely first-order 
unification, it is sufficient to adopt the conventional first-order style encoding. For 
the purpose of minimizing the run-time cost on the maintenance and manipulation 
of types, this approach can be further refined by separating a type into a fixed part 
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that is available during compilation and a dynamic part that should be decided at 
run time. The former information can be combined with the descriptor of a given 
constant, and the association of a constant cell with its type is then reduced to only 
the dynamic part. A detailed discussion on this topic appears in Section 7.4 after we 
have obtained a concrete understanding of the run-time type processing scheme in 
our implementation. 



Chapter 6 



An Abstract Machine and Processing Model 

We are interested in this thesis in a compilation-based model for realizing XProlog. 
One possible target for a compiler that emerges from our considerations could be 
the instruction set for standard hardware. This is, in fact, the usual choice for con- 
ventional languages. However, the distance between typical machine architectures 
and the computational model for XProlog is too significant to bridge in one step. 
Moreover, these differences make it difficult to visualize and to state precisely the 
optimizations that can be performed on particular instruction sequences that a com- 
piler might generate. For this reason, we introduce an intermediate level "abstract 
machine" for XProlog. We describe the structure of this abstract machine in this 
chapter, also interleaving with this description a presentation of the process of com- 
piling AProlog programs into instructions for this machine. Our abstract machine 
will inherit its basic structure from the developments related to compiling Prolog 
programs that have resulted in the abstract machine designed by Warren for that 
language [63]. We will also make use of a previous machine designed by Nadathur 
and colleagues for XProlog [29, 40, 41] that underlies the Version 1 of the Teyjus 
implementation of this language [45]. However, unlike this earlier implementation 
that tackled full higher-order unification using Huet's procedure, we will exploit the 
possibility of using the higher-order pattern unification algorithm described in Chap- 
ter 4. This choice simphfies the structure of the abstract machine considerably, leads 
to optimizations in the treatment of types as we discuss in the next chapter and also 
has the potential for impacting the overall runtime performance on XProlog programs. 

This chapter is organized as follows. We introduce the basic processing model 
underlying the new abstract machine in Section 6.1; as mentioned already, this model 
is based on the Warren Abstract Machine (WAM) for Prolog [63] , with which we shall 
assume that the reader to have some familiarity. Section 6.2 and Section 6.3 then 
discuss the details of the enhancements to this model that are needed for handling 
the higher-order features of XProlog. Specifically, Section 6.2 addresses the treatment 
of generic and augment goals, and Section 6.3 discusses how the higher-order pattern 
unification is embedded into the overall processing. A complete example of a compiled 
XProlog program is presented in Section 6.4. Section 6.5 sketches the treatments to 
flexible and disjunctive goals. Significant aspects of the treatment of generic and 
augment goals and almost all of the treatment of flexible and disjunctive goals are 
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inherited from the earher abstract machine for XProlog but their presentation is, 
nevertheless, needed here for the sake of completeness. 

Our focus in this chapter will be on the conceptual structure of the new abstract 
machine and the processing model embodied by it. This design has been realized in an 
actual implementation of XProlog — Version 2 of the Teyjus system — a presentation 
of which appears in Chapter 8. 

6.1 The Processing Model 

The WAM provides a basic framework for compiling the aspects of control and uni- 
fication that are part of the computation in Prolog-like languages. These aspects 
appear also in XProlog and so we can use this structure in our abstract machine as 
well. In this context, we note that the compilation of control refers to the translation 
of the dynamic analysis of the structure of complex goals carried out by the abstract 
interpreter described in Chapter 4 into low-level abstract machine instructions. The 
compilation of unification, on the other hand, corresponds to using knowledge of one 
half of the disagreement pairs to reduce the amount of work that needs to be done 
at runtime. Specifically, this translates into generating instructions for analyzing the 
structure of terms that arrive in argument registers when attempting to match with 
the head of a clause and for correspondingly setting up the argument registers when 
calling predicates. 

The basic WAM model is enhanced in our implementation in order to support the 
richer set of features present in XProlog. First, our compilation treatment of control 
computations should include that of generic and augment goals in addition to the 
set of goal structures contained in the Horn clauses that underlie Prolog. Second, 
in comparison to first-order setting, the unification operation of interest to us deals 
with a richer term structure and involves a more complicated notion of equality. To 
accommodate this, we make the following additions. To treat the richer equality 
notion, we utilize invocations to a head normalization procedure at relevant points 
in the computation. Then we partition the unification computation into first-order 
and higher-order parts, so that the former can be handled by (compiled) WAM style 
instructions, and the latter by an auxiliary interpretive procedure that is based on the 
higher-order pattern unification algorithm. Notice that this partitioning is something 
that must happen dynamically because whether the unification problem is a first- 
order or a higher-order one depends also on a term whose structure is known only 
at runtime. To deal with this, we build appropriate machinery into the abstract 
machine instruction set that is responsible for recognizing and delaying the higher- 
order parts of unification, and for invoking the interpretive phase of unification at 
chosen computation points. Further, we provide devices for delaying the unification 
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problems that are recognized to be beyond the Lx subset during the interpretive 
phase and for carrying them across goal invocations, to be re-examined when variable 
bindings may have altered their status. 

The details of the additional support that is summarized above are presented in 
the next two sections. The rest of the discussion in this section provides a sketch of 
the memory structure of our abstract machine and the underlying processing model 
that will be needed in order to explain these details. 

The basic data areas in our abstract machine consist of a code heap, a 

stack, a collection of registers, a push down list (PDL) and a trail. The first four 
categories of data areas are familiar from conventional machine architectures although 
some of them have different actual purposes in our setting. The code area contains the 
compiled forms of clauses that constitute the definitions of predicates. The heap is a 
global memory space for holding data that is accessible at any point of computation; 
specifically, this is where complex terms that survive after the successful completion 
of a goal must be placed. One of the uses of the stack, that is similar to the use 
made of it in conventional languages, is to record environment frames for calls to 
particular clauses that constitute the definition of a predicate; such frames will store 
register images and other relevant data that need to be maintained between calls to 
goals that are part of the body of the clause in question. The stack is also used 
to store information for handling nondeterminism, a feature that is peculiar to logic 
programming languages. In particular, when alternatives are available during clause 
selection, the contents of relevant registers should be saved, so that the execution 
context can be recovered when it is necessary to attempt a different clause choice, i.e., 
when backtracking occurs. Such information is maintained in structures called choice 
points, which are interleaved with environment frames on the stack. In our abstract 
machine, the stack is also used to maintain information that is needed to support 
augment goals. We defer discussion of this usage till the next section. Registers are 
of two kinds: those that store data and those that are needed for execution control. 
Examples of the former include a set of data registers, Ai, An that are used for 
passing arguments across calls to clause definitions, and the S register that points 
to the next argument of a complex first-order term (which is an application with a 
rigid head) . The set of registers relevant to execution control consist of the program 
pointer, P, the continuation pointer, CP, the top of the heap register, H, the most 
recent environment frame register, E, the most recent choice point register, B and the 
top of the trail register, TR. Both sets of registers will be enriched to support higher- 
order features, as we discuss in the next two sections. The push down fist, PDL, is 
used within the interpretive unification process for recording the subproblems that 
are created by the process of term simplification discussed in Section 4. The trail area 
is also used to assist the branching behavior, which records images of the fragments 



66 



/* copy a a. */ 

CI: { Set up a choice point on the top of stack and record that the next candidate clause 

is available at C2. } 

LI: { Unify arguments of the incoming goal with those of the clause head. } 

{ Return by continuing from the continuation point. } 

/* copy (app Tl T2) (app T3 T4) :- copy Tl T3, copy T2 T4. */ 

C2: { Recover relevant registers from the information in the latest choice point on the stack, 

update the choice point and record that the next candidate clasue is available at C3. } 

L2: { Set up an environment frame on the top of the stack. } 

{ Unify arguments of the incoming goal with those of the clause head. } 

{ Set up arguments for copy Tl T3. } 
{ Shrink the environment frame, update the continuation point to the next instruction 

and call copy. } 

{ Set up arguments for copy T2 T4. } 

{ Remove the latest environment frame from the stack. } 

{ Call copy. } 

/* copy (abs Tl) (abs T2) : - Pi c\ {copy c c => copy (Tl c) (T2 c)). */ 

C3: { Recover relevant registers from the information in the latest choice point on the stack, 

and remove the choice point. } 

L3: { Set up an environment frame on the top of the stack. } 

{ Unify arguments of the incoming goal with those of the clause head. } 

HI: { Carry out control actions for entering a generic goal and then an augment goal. } 

{ Set up arguments for copy (Tl c) (T2 c). } 
{ Shrink the environment frame, update the continuation point to the next instruction 

and call copy. } 

H2: { Carry out control actions for leaving an augment goal and then a generic goal. } 

{ Remove the latest environment frame from the stack. } 

{ Return by continuing from the continuation point. } 



Figure 6.1: Compiled computations underlying the program copy. 

of the heap and stack that need to be recovered upon backtracking. 

Computations occur within our abstract machine from executing a sequence of 
instructions that are generated from compiling goals, which correspond to the user 
query or the bodies of clauses, or from compiling the selection of a clause for a 
predicate and the subsequent unification with the head of the clause. The compilation 
of a goal is organized as follows. First, instructions are generated to realize the 
processing of the logical symbols that appear in a complex goal. Eventually, an atomic 
goal is reached. At this point, instructions are produced to set up the arguments of 
the goal in the argument registers; if these arguments are complex terms or variables, 
they will reside in either the heap or in the environment frame and the relevant 
registers will contain references to these structures. The last instruction for the 
atomic goal will be a call to the code for the predicate in question. Apart from 
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transferring control to the (next) relevant clause for a predicate, the code for clause 
selection has the responsibility of setting up a choice point in the stack to represent 
the remaining alternatives. The first action that the code for unification with the 
head of a selected clause must do is set up an environment frame if one is needed. 
The remaining instructions are responsible for carrying out the needed unification 
between the arguments appearing in the clause head and the ones passed in the 
argument registers from the invocation of the atomic goal. If this unification is 
successful, computation passes to the instructions arising from the compilation of the 
goal constituting the clause body, whose treatment we have already described. If 
this goal is solved successfully, then computation must return to the caller and the 
last instruction for the clause body will have the effect of realizing this. Notice that 
the environment frame that was created for this clause can be released at this point 
provided it is not needed for backtracking, in which case it will be protected by a 
choice point that appears above it in the stack. Of course, failure can occur in the 
course of unification with the head of a clause. This triggers a backtracking proccdiire 
whose first task is to carry out a resetting of the heap and stack state to what it was at 
the current most recent choice point. The information for such a resetting is stored in 
the trail and, hence, this process is referred to as the "unwinding of the trail." Once 
this is done, the relevant registers are restored from the information available from 
the most recent choice point and computation proceeds to the next clause definition 
(also recorded with the choice point), after updating or discarding the choice point 
itself depending on whether or not further alternatives arc available. 

The control computations are optimized in a manner similar to that in the WAM 
within our abstract machine as well. First, upon making a call, the environment frame 
of the caller is dynamically shrunk by discarding permanent variables whose binding 
information are no longer needed for solving the goals in the clause body remained to 
be processed; this process is referred to commonly as "environment trimming." Sec- 
ond, when a clause body is constructed from a sequence of conjunctions and the last 
conjunct is atomic, last call optimization [62] is performed. Essentially, the caller's 
environment frame is deallocated from the stack before computation actually proceeds 
to the callcc. and the call is carried out after setting the continuation point register 
to the continuation point passed to the caller, so that the callcc can directly returns 
to its grand parent in the call graph. This optimization subsumes the traditional tail 
recursion optimization in the logic programming setting. 

We illustrate the compilation model and the associated processing scheme that 
we have described relative to the simple XProlog program appearing in Figure 2.1 
that defines the copy predicate. A high-level pscudo code description of the compiled 
program in our implementation is contained in Figure 6.1. 

A final aspect to be mentioned with regard to the compilation model is the op- 
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copy : { Switch on the head of the (head normal form of) the first actual argument of copy: 
variable: continue with the instruction at CI. 

de Bruijn index: continue with the instruction at CI. 

constant: continue with the instruction at S. } 

S : { Switch on the given constant: 

a : continue with the instruction ai LI. 

app : continue with the instruction at L2. 

abs : continue with the instruction at L3. } 

CI : ... 

Figure 6.2: Indexing on copy. 

timization corresponding to the detection of determinism. The runtime treatment 
of nondeterminism involves the manipulation of choice points that is known to be 
costly and can often be eliminated by utilizing the structure of actual arguments of 
atomic goals to prune choices early during execution. For this purpose, a special set 
of instructions are included that allow clause choices to be indexed by the head ar- 
guments. Taking the copy example, instructions in Figure 6.2 can be added to those 
in Figure 6.1 for the purpose of indexing. 

6.2 Compiling the New Search Primitives in AProlog 

We now consider the extensions to the basic processing model to deal with generic and 

augment goals. Our discussion only sketches these extensions to the extent needed 
for a complete description of our abstract machine. A more thorough treatment may 
be found in [41]. 

As described in the previous chapters, the presence of generic goals requires a more 
careful treatment of unification. More specifically, to deal with the scoping effect of 
such goals on names, universe levels are associated with constants and logic variables 
and are examined and adjusted by the unification process. The determination of 
the appropriate universe level in our abstract machine is based on a global universe 
counter, which starts from on the top-level query, and is increased or decreased upon 
entering or leaving each generic goal. This global universe counter is maintained in 
a new register called UC. This register is incremented and decremented by two new 
instructions, incr_universe and decr_universe, respectively. Some of the actions in 
the WAM based model are also modified to facilitate the proper manipulation of 
the universe counter. The contents of the UC register is stored in choice points 
so that this register can be restored upon backtracking. These contents are also 
recorded in environment frames; the instructions that create terms corresponding 
to the arguments of atomic goals appearing in the body of a clause and possibly 
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embedded within generic goals may need the old value in this register for tagging 
variables that are bound by the implicit quantifiers at the clause level. 

It is necessary also to deal with the direct effects of a generic goal: such a goal 
must give rise to a new constant that is tagged with the (incremented) value of the 
UC register and that must then be substituted in the body of the goal for the quanti- 
fier variable. In our abstract machine, we deal with these requirements by assigning 
a slot in the environment frame to the quantified variable — thereby treating it as a 
permanent variable in WAM terminology — and by storing the appropriate constant 
in this slot. These actions are carried out by a new instruction called set.univJag: 
as expected, this instruction takes as operands a displacement in the environment 
frame and a constant. As a concrete example of the design above, the pscudo in- 
structions from label HI to H2 in Figure 6.1 that corresponds to the generic goal 
Pi c\ {copy c c => copy (Tl c) (T2 c)) can take the following structure. 

HI : { mcr_umverse 

{ set-univ-tag <offsct to the environment frame>, c 
H3 : { Carry out control actions for entering an augment goal 

{ Instructions for copy (Tl c) (T'2 c). 
H4 : { Carry out control actions for leaving an augment goal. 
H2 : { decr.universe 

Goals in AProlog could also have the form (Sigma x\ G), i.e., they could be 
explicitly existentially quantified. Such goals may be permitted in Prolog too, but, 
because of the simple syntactic structure of goals in that setting, in particular, the 
absence of generic goals, such goals can be treated statically by moving the existential 
quantifiers out into universal ones over the entire clause and can then be treated via 
standard techniques. In our case, we can almost use the same scheme. There is, 
however, one exception: the particular location of the existential quantifier may have 
an impact on what universe index is to be stored with the variable. To accommodate 
this, we add a further instruction that is called tag_exists to our abstract machine. 
This instruction takes a variable, which is eventually a stack or heap location, as an 
argument and sets its universe index to the value currently in the UC register. 

The semantics of an augment goal D —> G require the addition of D to the 
existing set of program clauses before the processing of G, and the retraction of these 
added clauses upon the successful solution of G. The searching mechanism used 
for clause selection has to therefore support dynamic modifications to the available 
predicate definitions. To realize this, a memory component called an implication point 
is introduced. These implication points are stored on the stack and a new register, the 
/ register, is introduced to record the most recent implication point. Each implication 
point also records the most recent implication point at the time of its creation; in 
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other words, the sequence of imphcation points themselves form a stack. Suppose 
that D provides (additional) clauses for the predicates {pi, . . . Then one of the 

components contained in the implication point corresponding to the addition of D is 
a search table that will ultimately yield a pointer to the compiled form of the code for 
each of these predicates. If no entry is found for a particular predicate when searching 
from this implication point, the search continues from the implication point that this 
one points to; thus, the overall program context existing at any stage of computation 
is completely defined by the contents of the / register. The implication point also 
contains a next clause table of size n that provides pointers to the definition (or code) 
for each of the predicates pi, . . . ,p„ that existed at the time of its creation paired with 
the implication point that corresponds to this definition. This table complements a 
special instruction called trust^ext to complete the compiled form of the code for the 
predicates pi, . . . ,Pn as we describe later. Notice that the right next clause table to 
use is determined by the implication point that added the code currently being tried 
for the relevant predicate. To isolate this implication point, we add to the abstract 
machine yet another register called CI. 

Two new instructions are introduced to support the compilation of an augment 
goal. The pushJmpLpoint instruction is used upon entering an augment goal for the 
creation of an implication point. This instruction is also responsible for setting up the 
next clause table for the implication point, something that is done by searching the 
program context given by the current contents of the / register for definitions for each 
of the relevant predicate. The pushJmpLpoint instruction takes as argument a pointer 
to a compile-time prepared table that contains information about the predicates for 
which code is being added and also pointers to the specific code that needs to be 
included. Symmetrically, the instruction popJmpLpoint serves to remove the latest 
implication point from the stack upon leaving an augment goal. This action is carried 
out simply by setting the / register to the implication point reference stored in the 
one that this register currently points to. Considering the copy example, now the 
pseudo instructions labeled from H3 to H4 that correspond to the augment goal 
{copy c c —> copy (Tl c) {T2 c)) can take the following form. 

H3 : { push-impLpoint t } 
{ Instructions for copy (Tl c) (T2 c). } 
H4 : { popJmpLpoint } 

We assume above that t is a pointer to a table prepared for the addition of the clause 
copy c do the existing collection of predicate definitions. 

Code that is added dynamically for a predicate must allow for the possibihty 
that it is extending an already existing definition. To support this situation, the 
code that is normally generated from the clauses for the predicate is enclosed within 
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a try.me.else and a trust.ext instruction. The leading try.me.else sets up a choice 
point with the indication that the alternative definition starts from the trust^ext 
instruction at the end of this segment of code. The trusLext instruction takes as 
argument an index into a next clause table. The trust.ext instruction first retrieves a 
pointer to the next clause to try for the predicate from the next clause table stored 
in the implication point referenced by the CI register and it resets this register to 
the associated implication point also obtained from this table. It then transforms the 
rest of the computational context as needed for backtracking by using the contents 
of the current choice point, which it then discards. 

A subtle but important point to be noticed about the clauses that appear in 
augment goals is that these may contain free variables in them. For example, consider 
the following generic goal that appears in one of the clauses for the copy predicate: 

Pi c\ {copy c c ^> copy (Tl c) {T2 c)) 

Recall that the quantified variable c is treated as a variable for which space is allo- 
cated in the environment record for the parent copy clause. Further, the processing of 
the universal quantifier results in a constant (with appropriate universe index) being 
bound to this variable. When interpreting the embedded clause copy c c, therefore, 
it is important to have available the environment record of the parent clause in order 
to interpret the "variable" corresponding to the occurrences of c. In short, we treat 
clauses as closures, to be interpreted relative to an environment that is pointed to 
by a special register called CE. Use is made of a new instruction called imt_variable 
whenever it is necessary to get the binding for a variable from the "parent" envi- 
ronment. This instruction takes two arguments: a register or an environment slot 
designating the location of the variable local to the clause being considered and the 
environment slot for the parent clause from which the binding must be obtained. The 
instruction uses its two arguments to tie these two variables together. 

As an illustration of the discussion of the compilation of embedded clauses, the 
clause copy c c that occurs within the generic goal just considered would be compiled 
into the following sequence of (pseudo-)instructions: 

Dl : { try.me.else D2 

{ init-variable (local location of c), Yi 

{ Code for unifying first two argument registers 
with variable denoting c local to this environment 

{ Return control to the continuation point. 
D2 : { trust. ext 1 

Here Yi denotes the location of the slot assigned to the universally quantified variable 
corresponding to c in the environment record pointed to by the CE register. It is. 
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of course, necessary to set this register appropriately for each clause that is being 
tried. To facilitate this, a pointer to the relevant environment record is stored in the 
implication point at the time that it is set up. Notice also that the index for the 
trust.ext instruction here is 1 because there is code for exactly one predicate that is 
added by the associated augment goal. 

A final point concerns the instructions for invoking the code for predicates. As 
we have noted in this section, the entry point into such code can change during 
execution. For this reason, we need a special set of calling instructions that will 
initiate the search for appropriate code from the implication point referenced by the 
/ register. These instructions will, for instance, have to be used for any calls to the 
copy predicate whose compilation we have just considered. Note, however, that the 
old WAM style calling instructions are also retained in our abstract machine. These 
can be used for predicates whose code cannot be altered dynamically. Moreover, it is 
preferable to use them wherever possible because the address to which control needs 
to be transferred then does not need to be calculated at runtime. 

6.3 Compilation of Higher-Order Pattern Unification 

We now turn our attention to providing support for higher-order pattern unification. 
We first consider extensions for this purpose to the data areas present in the original 
structure of the WAM. These extensions are of two kinds: the introduction of new 
devices and enhancements and modification to the ones already present in the WAM. 
The specifics of these changes are as follows. First, we add new registers called Head, 
ArgVector, NumArgs and NumAbs that provide access to the head, the arguments, 
the number of arguments and the binder length of a head-normal form right after it 
has been computed. Second, in addition to the role it plays in realizing the interpre- 
tive unification process, the PDL is also used to temporarily maintain higher-order 
unification problems that arc delayed when executing the compiled form of unifica- 
tion arising from matching with the clause head. Third, unification problems that lie 
outside the Lx subset need to be carried as constraints across goal invocations and 
the heap is used to maintain such problems in the form of a list of disagreement pairs. 
The beginning of this hst is recorded in a new register called LL. The heap is further 
used to store the terms that are created in the course of head-normalization and in 
the binding phase of pattern unification. In the intended scheme, /3-contractions are 
carried out destructively during head-normalization so as to share the effects of such 
rewriting steps. Since it may be necessary to undo these mutations on backtrack- 
ing, we also change the trail so that it additionally maintains a record of any such 
mutations that arise during processing. 

As mentioned in Section 6.1, the unification on the arguments of a clause essen- 
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tially consists of a first-order and a higher-order part, whereas WAM style instructions 
for unification are only sufficient in handling the former. Our abstract machine still 
uses the WAM style instructions to solve the first-order subproblems, and delays the 
higher-order ones by pushing them onto the PDL. The problems left on the PDL 
in this way are examined by an interpretive pattern unification procedure that is 
invoked as the culminating instruction in the sequence that realizes unification with 
the clause head. The structure of the unification part of the processing model can 
thus be described schematically as follows: 

{ For each argument in the clause head 

{ Instructions for carrying out the first-order part unification and 
postponing the higher-order part onto the PDL. 

} 

Invoke the interpretive pattern unification procedure on the PDL. 

} 

Now we consider the compilation of the unification on each pair of arguments. 
Compared with what has to be dealt with by the WAM, the following new issues 
arise in our setting. First, a richer collection of term structures participate in the 
computation. Second, a head normalization procedure has to be invoked to bring 
terms into comparable forms at the necessary points. Finally, relevant instructions 
have to be enhanced with the ability to properly separate higher-order subproblems 
from first-order ones, taking the necessary steps to solve the latter while pushing 
the former onto the PDL. Taking these issues into account, the processing in our 
implementation can be described by the unify procedure in Figure 6.3. The first 
argument to this procedure is the argument from a clause head, i.e., whose structure 
is statically known, and is assumed to be normalized at compilation time. The second 
argument is the one dynamically appearing at runtime. It should also be noted 
that the actions carried out in compilation and at runtime are both present in this 
procedure, and we use bold letters to distinguish the latter. 

The auxiliary functions interp.unify and head.norm in unify denote the interpre- 
tive pattern unification and head normalization procedures respectively. A call to the 
procedure bind in a form bind {X, t) essentially carries out the action of binding a 
logic variable X to the term t. In the situation when X is from a static argument 
of the clause head, the logic variable is not explicitly created, but, rather, given by a 
data register or a slot in the environment frame. Binding in this case is carried out 
by placing a reference to the term t in the relevant place. Finally, in the case when 
the static term t input to the unify procedure is a first-order application and the 
dynamic term s is a logic variable or higher-order term, the recursive calls to unify 
simply serve to construct the arguments of t on the heap. For this reason, it is not 
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unify {t, s) 

switch on the structure of t : 
case A (n, t') : 

create t on the heap 
interp_unify(it, s) 
case {F ai ... an), where _F is a variable and n > : 
let t' be a term of form 

{F fli ... a„), where F is a new logic variable, 
if this is the first oc;c;urreiic;e of the variable in the clause; 
(/ fli ... a„), where / is the term to which the variable F is bound, 
if this is the subsequent occurrence of the variable in the clause, 
create t' on the heap 
interp_unify(t, s) 
case X, where X is a variable : 

if this is the first occurrence of X in the clause, then bind(X, s). 
else interp_unify(t', s), where t' is the term to which X is bound, 
case (c oi ... a„), where c is a constant, and n > : 
head_norm(s) 

if s is (r' bx ... bm), where r' is rigid and m > 
then if r' 7^ c or n 7^ m then backtrack 
else for 1 < i < n: unify {a^, bi) 

else 

create t' as (c Xi ... X„) on heap, where Xi are new variables 

if s is a logic variable X 

then if uc{X) < mc(c) then backtrack 

else bind(X, t') 
else /* s must be a higher-order term */ 

push the pair s) onto PDL 
for 1 < z < n: unify {ai, X^) 

Figure 6.3: The unification model in our compilation implementation. 
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necessary to actually create the new variables Xj's that are used in the presentation 
of the pseudo code. Instead, space is allocated on the heap for an argument vector 
of size n and the recursive calls to unify enter a term creation mode — known as the 
WRITE mode in contrast to the READ mode that is used when term structure needs 
to be analyzed — during which the arguments of t are created and references to them 
are placed into the relevant slots in the argument vector. 

The conventional WAM style term creation and unification instructions are cat- 
egorized into the put, set, get and unify classes. Roughly mapping to the unify 
procedure in Figure 6.3, the get class of instructions can be used to carry out the 
actions required by the cases where the static term is a first-order application and 
where it is a constant or variable that appears directly as an argument of the clause 
head. When the unify procedure is invoked recursively over the arguments of the 
(static) applications, the unifications over the embedded variables and constants can 
be handled by the set of unify instructions. The put and set instructions are used in 
the WAM solely for setting up the the actiial arguments of atomic goals and do not 
get used in head unification. In our context, when the static term has a higher-order 
structure, it has to be first created and then handed to the interpretive unification 
process. The term creation actions are carried out by the put and set classes of in- 
structions, i.e., these instructions may be interleaved with get and unify instructions 
in the compilation of head unification. 

Within this picture, now we start to examine the enhancements to each category 
of instructions for supporting the higher-order aspects of unification. Since the set 
category of instructions are in fact a light-weight form of those in the unify class, 
i.e., their actions are the same as those carried out by the unify instructions in the 
WRITE mode, we do not discuss these separately in what follows. 

In contrast to the first-order setting, term creation in our context has to deal 
with a richer collection of structures. First, the head of (a head normal form of) 
an application can be a de Bruijn index or a logic variable in addition to being a 
constant. For this reason, the put^structure instruction in the WAM is generalized 
into put.app. This instruction gets three arguments: a data (argument) register A^, 
a data register or an offset into an environment frame Xj and a positive number n. 
This instruction first creates an application term on the heap with its head being 
the term referred to by Xj and an empty argument vector of size n. Then Ai is set 
to refer to the new application term and the S register is prepared to refer to the 
beginning of the argument vector for the subsequent instructions to actually fill in 
the arguments. The second source of higher-order structures is the appearances of de 
Bruijn indexes and abstractions. For the creation of the former, new instructions 

put_index Ai, n and unify Andex n 

are introduced. The first one is used for a de Bruijn index that is not directly an 
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argument of an application. Its execution constructs a term corresponding to the de 
Bruijn index n on the top of the heap and sets the data register Ai to refer to it. 
The unify Jndex instruction corresponds to an apphcation argument. It can be only 
invoked in the WRITE mode and its effect is to create a term corresponding to the 
de Bruijn index n in the heap location given by the register S and to increment S 
to point to the next argument vector slot. Similarly, the creation of an abstraction 
A (n, t) is realized by the pair of new instructions 

putJambda A^, Xj, n and unifyJambda Xj, n, 

depending on whether the abstraction appears directly as an argument of an appli- 
cation. A reference to the term t is assumed to be contained by the data register or 
environment offset Xj. 

The instructions constructing compound terms assume that the head of an appli- 
cation and the body of an abstraction are given by data registers. However, these 
components can in particular situations correspond to permanent variables which re- 
side in environment frames on the stack. In these situations, the relevant permanent 
variables have to be globalized prior to use. To facilitate this, our abstract machine 
include the instructions 

globalize Yi, Aj and globalize Ai. 

The first one dereferences the permanent variable given by an offset to an environ- 
ment frame. If the resulting term still resides on the stack, it is copied to the top of 
the heap and then sets both that stack cell and the data register Aj to refer to the 
newly created heap cell. Otherwise Aj is made to be a reference to the dereferenced 
result. The second instruction simply dereferences the given A^, carries out the glob- 
alizing actions described before if necessary and leaves a reference to the appropriate 
heap term in Ai. 

The get and unify instructions are used for carrying out compiled unification. 
These instructions are enhanced to handle terms whose structures may be revealed 
to be higher-order at runtime. Changes are made for the instructions 

get-structure Ai, f, n, get-Constant Ai, c and unify -constant c, 

in which Ai is required to be a data register referring to the incoming term, / and c 
are required to be constants and n is a number denoting the arity of the application. 
Executing these instructions (in the READ mode for the last instruction) first invokes 
the interpretive head normalization procedure on the term referred to by Ai for the 
first two instructions and the one referred to by the -S" register for the last. Let the 
resulting term be s; as already explained, its decomposition will be given by the 
contents of the registers Head, Arg Vector, NumArgs and NumAbs at the end of head 
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get^structure 


Al, 


app, 


unify. variable 


A2 




unify. variable 


A3 




get.structure 


A3, 


abs, 


unify. variable 


A4 




putJambda 


A5, 


A2, 


pattern.unify 


A4, 


A5 



2 % Al = (app 

% X 
% A3) 

1 % A3 ^ (abs 

% A4) 

1 %A5 = X{1,X) 
% A4 = A5 



Figure 6.4: Compiled unification over a head argument {app X {abs {y\ X))). 

normalization. If s has a higher-order structure, i.e., if it is an abstraction or a flexible 
application, a disagreement pair with the first term being (a reference to) s and the 
second referring to the current top of heap or to the location given by S is created 
on the PDL. In the situation when get.constant or unify .constant is executed, the 
constant c is then created as the second term of the disagreement pair. When the 
executed instruction is get.structure, the term pushed onto the top of heap is then an 
application with an empty argument vector of size n and with its head referring to a 
new constant term corresponding to /. Further, the S register is set to the first entry 
of the argument vector, and execution proceeds to the following unify instructions 
in WRITE mode. The unify.value X^ instruction is also changed so that when it 
is executed in the READ mode, it causes the pattern unification procedure, rather 
than the first-order unification procedure, to be invoked in interpretive mode on the 
pair of terms given by the register or environment offset Xi and the S register. In 
addition, a new instruction 

pattern.unify Xj, A^ 

is introduced as a variant of unify^value in the READ mode. This instruction appears 
at the end of a sequence of put and unify (in the WRITE mode) instructions that 
serves to create a higher-order term appearing in a clause head. This instruction also 
invokes the higher-order pattern unification procedure in interpretive mode to unify 
the created term that is referenced by Xj and the incoming term that is given by the 
argument register Ai. 

For a concrete example of the usage of our unification and term creation instruc- 
tions, we can consider the compilation of the term {app X {abs {y\ X))) as an 
argument within a clause head, assuming that app and abs are the constants that we 
encountered in the copy program. The instructions resulting from a compilation of 
this term are shown in Figure 6.4. 

The instruction set for our abstract machine includes a new instruction called 
finish.unify that is used at the end of the processing of the entire clause head. This 
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instruction invokes the interpretive pattern unification procedure over the disagree- 
ment pairs that have been pushed onto the PDL during the head processing. Further, 
if bindings to logic variables have actually occurred during head unification, the global 
disagreement set recording non-L;^ problems generated from computation steps prior 
to the processing of the current clause is also examined at this stage with the expecta- 
tion that some of them could actually become L\ after the bindings. It is interesting 
to note that this way of examining the global disagreement set could in theory lead 
to bad performance: if a large number of non-L^ pairs are carried along across the 
solutions of atomic queries and only a relatively small portion of it actually becomes 
Lx after the processing of each clause head, then the repeated examination on the 
contained disagreement pairs will be mostly redundant. This conceptual problem can 
be solved by using a sophisticated freeze-wake mechanism proposed by [34] . Within 
this scheme, a unsolvable disagreement pair is directly associated with the logic vari- 
ables contributing to it, and the re-examination is triggered only when the binding 
of the logic variable actually occurs. However, the "extreme" case described above in 
fact rarely occurs in the context that we are interested in: in most practical XProlog 
programs, it is either the case that all the disagreement pairs are Lx the first time 
they are looked at, usually because the program itself has been written to adhere to 
the Lx style, or the case that a non-L;^ pair is transformed into an Lx one at the 
end of the processing of the clause head in which the pair was encountered. Based 
on this observation, the simple processing scheme that we have chosen for delayed 
disagreement pairs seems justified. 

A final new instruction for our abstract machine is head_normalize Xi, which 
carries out the head normalization of a term referred to by the data register or 
environment offset given by Xi. This instruction is used in the term creation process 
needed for setting up the arguments of atomic goals when it is obvious that a higher- 
order structure has been created. The purpose of enforcing head normalization over 
such structures at an early stage is to reduce the overhead of backtracking. The actual 
arguments have to be in head normalized form during the unification operations 
carried out during the clause selection. If this normalization is done before a choice 
point corresponding to clause selection is created, then the process of undoing and 
then redoing it because of a backtracking internal to this selection process can be 
avoided. 

A comparison between the processing model we have described here and the one 
underlying the implementation of Version 1 of the Teyjus system is in order. We focus 
here only on the issues that have been discussed so far; more differences will arise 
when we consider the treatment of types in the next chapter. In the earlier abstract 
machine, the higher-order part of the unification problems are separated from the 
first-order ones in a way similar to our scheme and are also handed to an interpretive 
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unification procedure for their solution. However, due to the branching nature of 
the unification procedure dealt with in that abstract machine, a more sophisticated 
(and more costly) control mechanism has to be considered. In particular, in addition 
to the choice point, a structure known as branch point had to be introduced for the 
purpose of recording choices in the incremental steps taken to solve rigid-ficxible 
pairs [40]. Further, these branch points have to be examined during backtracking for 
attempting the next alternative. This also introduces further complexity in treating 
choice points at least in that they have to be differentiated from branch points so 
that it is clear what action needs to be taken in the relevant cases. To avoid the 
storage of redundant control information for affecting backtracking caused by the 
branching of unification, special attention was paid in the design of that abstract 
machine to the precise structure of a branch point. The creation and the maintenance 
of branch points is carried out in that machine by an instruction that is also called 
finish.unify. The necessity of branch points is entirely eliminated in our context 
because we simply delay unification on any pairs that could cause branching. This 
has lead to a considerable simplification of the processing model and is also expected 
to lead to improvements in the execution behavior over practical XProlog programs. 



6.4 An Complete Example of Compilation 

We are now in a position to show the complete sequence of instructions that would 
be generated for the copy clauses shown in Figure 2.1. The code that we expect a 
compiler to generate corresponding to the first two clauses is shown in Figure 6.5, 
the code for the last clause is appears in Figure 6.6 respectively, and Figure 6.7 
contains the instructions for the embedded clause in the body of the last clause for 
the predicate. 

The instructions switch^onjterm and switch^on^constant in Figure 6.5 are used for 
indexing clause choices in a way described in Section 6.1. Specifically, the former 
takes the form 

switch.onJerm V, C, L, BV 

where V , C, L and BV are instruction addresses to which control must be transferred 
to when head normal form of the term referred to by Al is a fiexible term, a rigid 
term with a constant head other than ::, a nonempty list and a bound variable 
head respectively. The label fail is assumed to be the location of code that causes 
backtracking. The other instruction switch^on^constant carries out the second-level 
indexing among different constant heads. The first argument of it is a positive number 
indicating the number of constants under consideration and the second argument 
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copy : 
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head-normalize 


Al 
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execute-name 


copy 




% copy Al A2. 



Figure 6.5: Instructions for the first two clauses of copy. 
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Figure 6.6: Instructions for the last clause of copy. 



copy : try-mc-else 0, L9 % copy 

init-variable A3, Yl 

patterri-unify A3, Al % c 

pattern-unify A3, A2 % c. 

finish-unify 
proceed 

L9 : trust-cxt 2, 1 



Figure 6.7: Instructions for the dynamic clause of copy in the augment goal in its last 
definition. 
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refers to a hash table in which the mapping from the constants to the addresses of 
the corresponding clause definitions are stored. 

Among the control instructions appearing in the figures, try^me^else, retry^me^else 
and trusLme are used for the manipulation of choice points, and the former two have 
their second argument being the address of the clause definition that should be at- 
tempted upon backtracking. Their first numeric argument is used to indicate the 
number of argument registers that are to be saved or retrieved as relevant. The 
instructions allocate and deallocate are used for the creation and deletion of environ- 
ment frames on the stack. The argument of the former contains a positive number 
corresponding to the number of permanent variables that are to be allocated on the 
frame. The calls to clause definitions that need to be dynamically determined are 
handled by the instructions calLname and execute-name, whereas the return from a 
clause definition is effected by the instruction proceed. The instruction execute^name 
is specially intended for the last call optimization mentioned in Section 6.1. The 
numeric argument of the call instructions is used to indicate the number of variables 
that remain on the caller's environment frame at the time of the call. The instruc- 
tion trust-cxt n, i in Figure 6.7 is used to search for dynamically extended clause 
definitions in a way described in Section 6.2. The first argument n is the number 
of argument registers that should be recovered before the control is transferred to 
the found clause definition. Figure 6.6 also illustrates the usages of the higher-order 
control instructions pushJmpLpoint, popJmpLpoint, incr-universe and decr-universe, 
the computations underlying which are described in Section 6.2. 

Following the WAM convention, in the instructions shown in the figures, we have 
used the name Yi to depict the ith variable that is allocated in the environment 
frame. Also, the instructions unify ^variable and put^value are identical to the ones 
with the same name in the WAM and the instruction setjualue is used as a special 
case of unify-value in the WRITE mode. 

6.5 Treatment of Flexible and Disjunctive Goals 

Up to this point, we have provided a conceptual picture of our abstract machine and 
compilation model insofar as these related to the treatment of higher-order pattern 
unification. There are two issues that are remained to be explained. First, the im- 
plementation discussed so far assumes a monomorphic type system for our language, 
within which no runtime processing of types is necessary. This restriction has to be 
removed in the presence of the first-order polymorphic types, on which our language 
is actually based. A treatment of this aspect is deferred to the next chapter. Second, 
it is not clear yet on how the fiexible and disjunctive goals are handled. We discuss 
these aspects in this section. 
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The appearance of flexible goals, i.e., of goals of form {P ti ... where P is a 
variable, embodies the ability to mix in our language meta and object level usages of 
predicate expressions. A predicate definition that exploits this abihty is shown below: 

kind i type. 

type mappred (list i) ^ (i ^ i ^ o) ^ (list i) ^ o. 

mappred nil P nil. 

mappred (X :: LI) P (Y :: L2) : - P X Y, mappred LI P L2. 

Let hoh, john, mary, sue, dick and kate be constants declared with type i, and let 
parent be a constant of type i ^ i ^ o. Then the following additional clauses define 
a "parent" relationship between different individuals. 



parent 


hoh 


john. 


parent 


john 


mary. 


parent 


sue 


dick. 


parent 


dick 


kate. 



In this context, a query of form 

?- mappred (hoh :: sue :: nil) parent L 

can be asked, and can be solved with the answer substitution {{L,john :: dick :: nil)}. 
Following the operational semantics of our language specified in Section 4.2, it can 
be observed that in the course of solving this query, two new goals 

parent hoh Yl and parent sue Y2 

will be dynamically formed and solved. Another example of a query is 

?- mappred (hoh :: sue :: nil) (x\ y\ (Sigma z\ (parent x z, parent z y))) L. 

This goal asks for the grandparents of bob and sue and has as its solution the sub- 
stitution {{L,mary :: kate :: nil)}. Finding this answer requires two new goals of 
complex structures — each with an embedded conjunction and existential quantifier — 
to be constructed dynamically and then solved. 

As illustrated by the mappred example, flexible goals may be instantiated by 
terms containing predicate constants and with complex logical structures, thereby 
dynamically reflecting object-level occurrences of quantiflers and connectives into 
positions where they function as search directives. 

The problem faced in supporting flexible goals is that instantiations of their heads 
can change their structure dynamically, and so it is impossible to know at compile time 
the speciflc control action that they would give rise to during computation. However, 
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we can provide a partial compilation in that we can use the top level structure of these 
goals at runtime to pick between different compiled treatments of control structure. 
In particular, flexible goals can be compiled into calls to a special procedure named 
solve to which (the instantiated version of) the goal is provided as an argument. In 
the case that the incoming goal has a complex structure, the behavior of solve can 
be envisaged as of it were based on a compilation of the following clauses: 



When the argument given to solve is an atomic goal with a rigid head, then its argu- 
ments are loaded into appropriate data registers and the head is used to determine 
the code to be invoked subsequently. The only other situation that could possibly 
arise is that the actual argument passed to solve remains a flexible atomic goal; the 
syntactic restriction on the appearance of logical symbols in terms makes it impos- 
sible for any other case to arise. In this last case — when the argument of solve is a 
flexible goal — we follow the suggestion in [44] and solve the goal immediately with 
a substitution of the form Xxi . . . Ax„T for the variable that appears as the head of 
this goal. 

In our implementation, the solve predicate is treated as a builtin one whose real- 
ization is "hard-wired" into the abstract machine. 

Our treatment of disjunctive goals is based on a compile-time pre-processing of 
clauses to ehminate such disjunctions. Upon seeing a goal of the form ( Gl ; G2), the 
compiler creates a new predicate deflnition consisting of the following clauses: 

new-pred XI ... Xn :— Gl. 
new-pred XI . . . Xn : — G2. 

Here, new.pred is a name chosen such that it is distinct from any other name used 
in the program and {Xi, ...,X„} is the set of variables occurring free in (Gl ; G2). 
After generating and adding these clauses to the program, the compiler replaces the 
disjunctive goal with the atomic goal (new-pred XI ... Xn). As a concrete example, 
a clause presented in the form 



solve (Gl , G2) 
solve (Gl ; G2) 
solve (Sigma G) 
solve (Pi G) 



— solve Gl, solve G2. 

— solve Gl; solve G2. 

— solve (G X). 



— Pi x\ (solve (G x)). 



foo X :- barl U V , (bar2 (f X) U ; barS (f X) V). 



will be transformed into the sequence of clauses 



foo X : — barl U V , new-pred X U V. 

new-pred X U V : - bar2 (f X) U. 
new.pred X U V :- barS (f X) V. 
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by the pre-processing pass just described. 

An alternative treatment to disjunctive goals is possible: we could build in mech- 
anisms for creating choice points in the bodies of clauses. Thus, in the example just 
considered, we could use the following structure to compile the body of the clause for 
foo: 

{ Instructions for (barl U V) } 

try.me.else.disj L 
{ Instructions for (bar2 (f X) U) } 

L: trust.me-disj 

{ Instructions for (harS (f X) V) } 

Here, the instructions try_me_else_disj and trust_me_disj arc like the WAM instruc- 
tions try^me^else and trust^me except that it is the free variables occurring on the 
disjunctive goal that are recorded and used by these instructions rather than the 
argument registers. In the above example, instead of the contents of registers Al and 
A2, the actual information recorded in the choice point should be the bindings of the 
variables Xand V. Notice that we do not need to keep the information about U'vn this 
example. In general, the compilation process would have to carry out a "usefulness" 
analysis on the free variables that appear in disjunctive goals to determine the ones 
that really have to be remembered. 

Compared with the approach of creating new predicates, this alternative direct 
compilation of disjunctive goals has some advantages. First, it obviates the call to 
the additional predicate new.pred and consequently avoids the runtime overhead for 
such calls. Second, it provides a framework for analyzing which variables really need 
to be stored and hence for avoiding redundant book-keeping. For these reasons, the 
direct compilation of disjunctive goals is something that might be explored further 
as an improvement to our implementation ideas. 
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Efficient Support for Runtime Type Processing 

The processing model that we have developed for XProlog in the previous chapter 
has ignored the presence of types in the language and the impact these might have 
on computations. This model is accurate if the language uses a monomorphic type 
system, i.e., one in which all types are determined at compile time and do not sub- 
sequently change. However, this is not the true situation in XProlog as we have 
discussed in Section 2.4; XProlog uses a first-order polymorphic type system that 
leads to the possibility that the types associated with variables and constants may 
evolve during execution. Given this situation, it is important to determine the exact 
manner in which the evolution of types may impact on computation and to take ac- 
count of this in the processing model. As we shall see in this chapter, the place at 
which the identity of types is needed is in comparing constants. In particular, two 
constants may actually share a name but may be different in reality because their 
types are distinct and, moreover, do not even have a common instance. Unification 
must fail in this situation. To be able to determine failure, however, it is necessary 
to bring types along into the computation at relevant places and to actually check 
them for compatibility. 

We discuss the impact of polymorphic typing in detail in this chapter to make 
the above picture explicit and we develop the needed machinery for treating types 
appropriately. In the first section, we indicate the refinement that is needed to the 
basic higher-order pattern unification algorithm from Chapter 4 to account for types. 
A straightforward solution to this problem would simply construct types at runtime 
to attach them to constants and to pass them as additional arguments to predicates. 
However, types can be large in practice and constructing them explicitly each time 
they are needed can be costly both in time and space. In Section 7.2, we describe an 
approach to using information available at compile time to reduce the type analysis 
needed at runtime; this approach has the additional benefit of reducing the amount 
of type information that has to be garnered at runtime. Unfortunately, the approach 
cannot be used to eliminate type information to be associated with predicates in 
some situations when these are really not necessary. In Section 7.3 we discuss a 
different form of static analysis that captures these situations. The work described in 
Sections 7.2 and 7.3 has previously been presented in [47]. We conclude the chapter 
by using the approaches we develop to augment the abstract machine and compilation 
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structure described in the previous chapter to incorporate a treatment of types. 

Our discussion of the treatment of types pertains only to the situation where 
the processing model is based on the use of higher-order pattern unification. The 
abstract machine and compilation model underlying Version 1 of the Teyjus system 
had used Huet's procedure for higher-order unification. We note that considerably 
more type information needs to be carried along and this also needs to be analyzed 
more carefully in this situation. The choice we have made in this thesis has therefore 
resulted in a significant simplification in the abstract machine structure along this 
dimension as well. 

7.1 Types and Higher-Order Pattern Unification 

The term formation rules presented in Section 2.1 associate a type with every well- 
formed term of XProlog. To determine this type, it is important to know the types of 
all the constants and (bound) variables that appear in the term. The usual practice, 
however, is to not specify types with variables. When we allow for polymorphic types 
as in XProlog, it is possible to infer a most general type for each term even when the 
types of (some) variables have not been provided. We assume such a procedure in 
our context. Thus at the end of the compilation phase we assume that every term 
has been determined to be type correct and that the type of each term is also known. 
In a typical programming language, the usefulness of types would end at this point. 
However, this is not the case in XProlog as we have discussed in Section 2.4. In 
particular, constants and variables may be used within a term at refinements of their 
declared types and such refinements may impact on the precise computation to be 
carried out. 

Looking naively at the relevance of types to computation, we see that the abstract 
interpreter presented in Section 4.2 has use for types in two different forms: first, in 
rules 4, 6, 7 and 8 of Definition 4.2.4, when a logic variable or a constant is introduced 
into the computation context, it should have the same type as the existential or 
universal variable that is replaced; second, the unification invoked in rules 7 and 8 
should be a typed one. We observe, however, that the types introduced in the first 
set of situations do not have a real impact on the steps in computation. Types are 
needed in checking identity in unification as we shall see shortly and, in the case of 
each of these created objects, every instance of them share the same type. Thus, 
when checking their identity, a simple lookup of the names suffices; the types would 
have to match if the names are the same. 

The introduction of types in the higher-order pattern unification can generally be 
viewed as maintaining a type along with every logic variable and constant and using 
it to determine computation at necessary points. However, the types of logic variables 
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are neither examined nor refined in the process of constructing bindings. Further, the 
comparison of constants in this phase arc restricted to being between those appearing 
as arguments of logic variables in the appropriate instance of rule (5) in Figure 4.2. 
The higher-order pattern constraint requires such constants to have a larger universe 
index than the logic variable as the head, implying thereby that they must have been 
introduced by generic goals. Hence every instance of any such constant must already 
be known to have the same type. From these observations, it is evident that types 
are incidental to the binding phase of the higher-order pattern unification. 

The real substantial usage of types in the pattern unification is in fact in the 
simplification phase for determining the applicability of rule (4) in Figure 4.2: the 
identity checking on the rigid heads of the pair of terms may also require the matching 
of their types. Observe, however, that if these heads are matching dc Bruijn indexes 
(abstracted variables) or constants introduced by generic goals, then the types must 
already be identical. Thus the matching or unification of types is necessary only for 
the genuinely polymorphic constants declared at the top-level in the program. 

Based on these observations, rule (4) in Figure 4.2 can now be modified into the 
following. 

(4.1) {{{cr ti ... tn),{c„ Si ... Sn)) :: V,e) 

— > ((ti. Si) ::...:: s„) :: V , (j) o 9) , 
provided c is a constant such that jC{c) — and 
(j) is the most general unifier of r and a. 

(4.2) (((r ti . . . tn), (r si ... s„)) :: V, 6) ((ti, s-,) ::...:: (t„, s„) :: P, 9), 
provided r is a constant such that C{r) > or a de Bruijn index. 

In the rules (4.1) and (4.2), the type association to relevant constants is represented 
as a subscript. Further the labeling function C of the abstract interpreter is used to 
help differentiating between constants from the top-level and those introduced by the 
execution of generic goals. Finally, since the polymorphic types in our language can be 
essentially viewed as the terms in the first-order logic, a first-order unification process 
is assumed to be invoked on the types of the constant heads in the application of rule 
(4i) to either decide the non- applicability of this rule or to compute the most general 
unifier of them. Note also that we might want to provide the type instantiations 
back to the user along with answers. For this reason, we have assumed that our 
substitutions also maintain information about the ones made to type variables. 

Prom the above considerations, it is clear that the only sort of terms with which 
we need to maintain types at runtime are the top-level declared constants. Such 
association of types can be further reduced to minimize runtime type processing 
overhead, which is discussed in the next two sections. 
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7.2 Reducing Type Association for Constants 

An obvious solution to making types available with top-level constants is to add them 

as a special argument. For example, consider the list constructors nil of defined type 
(list A) and .■; of defined type (A ^ (list A) ^ (list A)). When these are used 
in constructing particular lists, the type variable A would be instantiated and the 
resulting type might be added as an annotation as illustrated by the following terms: 

(1 (:: int (list int) (list int)) (nil list int)) and 

("a" (:: string (list string) — >■ (list string)) (nil list string)). 

This solution is adequate but also contains redundant information. The declara- 
tion of a top-level constant ensures that the type of every occurrence of the constant 
in the program has a common skeleton part that is known at compile-tinie and that 
differences arise between the types of distinct occurrences of that constant only in 
the instantiations of variables occurring in the skeleton. Thus, the type of each le- 
gitimate occurrence of must have a skeletal structure (A ^ (list A) (list A)) 
that is further refined by an instantiation for A. This information can be exploited by 
avoiding the construction at runtime of the skeleton that often is the most complex 
part of the type. Moreover, compile-time type checking also ensures that two different 
occurrences of share this skeletal structure. Hence the matching of their types can 
be achieved simply by matching the particular instantiations of the variable A. 

We use the idea above by changing the annotation associated with each top-level 
constant from a complete type to a list of types that instantiate the variables that 
occur in its skeleton; the annotation must now be a list of types because there could 
be more than one variable appearing in the skeleton. Concretely, the representations 
of the two lists considered earlier in this section now become 

(1 (:: [int]) (nil [int])) and ("a" (:: [string]) (nil [string])). 

Based on this annotation scheme, we modify the transformation rules (4.1) and 
(4.2) used in unification to the following: 

(4.1') (((c [ti, . . . , r^] ti ... tn), (c [cTi, . . . , (T„] Si ... Sn)) :: V, 9) 
— > {{ti, si) ::...:: s„) :: V, (p o 9) , 

where is the most general unifier for {(ti, iTi ),..., (ti, iTi)}, 

n > and m > 0, if c is a constant. 
(4.2') (((r ti ... t„), (r si ... s„)) :: P, 9) ((^i, s^) ::...:: s„) :: P, 9), 

provided r a de Bruijn index. 

Notice that the type annotation for a monomorphic constant, i.e., a constant whose 
declared type does not contain variables, and for a constant introduced by a generic 
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(4.1") (((c[ri,...,T^] ti ... t 

> ((ti, Si) ::...:: s^) V , (j) o 9) , 

where is the most general unifier for {(ti, Ui ),..., (ti, (Ti)}, 

n > and m > 0, if c is a constant. 
(4.1") ((c [n, . . . , r„], c [ai, . . . , :: ^) ^ 

where m > 0, if c is a constant. 
(4'2) (((r h ... tn),{r s, ... Sn)) -.-.V.e) ^ ((ti, Si) :: . . . :: s„) :: 

provided r a de Bruijn index. 

Figure 7.1: The type annotated simphfication rules for pattern unification. 



goal is an empty list. These cases are then uniformly handled by rule (4.1') as the 
case where m = 0. 

The manner in which unification problems are processed actually allows for a fur- 
ther refinement of type annotations. The use of the transformation rules in Figure 4.2 
begins with a pair of atomic predicates whose heads will first have to be verified to 
have the same name and whose types will have to be matched; the matching of the 
types can be achieved by adding the instantiations of the type variables in the skeleton 
type as explicit arguments to the predicate and then compiling unification of these 
types as we shall see shortly. Once we have checked the matching of these types, we 
will then be assured that the actual argument terms that have to be unified have the 
same types. Further the unification transformation rules preserves this relationship 
between the terms in each disagreement pair. Thus, at the time when the types of 
different instances of a constant are being unified in the rule (4.1'), their target types 
are known to be identical. This fact imphes that once we have checked that the 
constants heading the two terms have a common name, there is no need to perform 
unification over the instances of type variables that appear in the target type of their 
type skeleton. In the case that all the variables in the declared type also appear 
in the target type, i.e., when the constant type satisfies what is known as the type 
preservation property [23] , there is really no need to maintain any type annotations 
with the constant. This happens to be the case for both ;.• and nz/, for instance, 
and so all type information can be elided from lists that are implemented using these 
constants. A further observation that can be made is that when the disagreement 
pair under consideration consists two constants only, their types are guaranteed to be 
identical already, so that type unification can be completely eliminated in this case. 
This leads to the final form of the transformation rules for simplifying rigid-rigid pairs 
that we present in Figure 7.1. 
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We now consider the correctness of the rules in Figure 7.1 relative to the original 
rule for simplifying rigid-rigid pairs. We begin with the assumption that the two 
terms in any disagreement pair considered by the transformation rules for unification 
have the same types. It is easy to see then that this property is preserved by the 
transformation rules in Figure 4.2. The first refinement to rule (4), i.e., the one 
contained in the rules (4.1) and (4.2), is easily seen to be correct once we note that 
the identity of a constant is determined also by its type. The correctness of the 
subsequent refinements to this rule that lead to the rules in Figure 7.1 then relies on 
the facts that, given two rigid terms of equal types that have a constant with the same 
name as their heads, unifying the instantiations of the variables that appear only in 
the argument types of the constant head in its two different occurrences will ensure 
that the types of these occurrences are equal and, furthermore, will make the types 
of the arguments in the two rigid terms also equal. The following theorem shows this 
to be the case. 

Theorem 7.2.1. Let c be a constant that has as its type skeleton the type a with n 
argument types. Further, let {Ui, . . . ,Uk} be the set of variables that appear in the 
target type of a and let {Vi, . . . , V/} be the variables that appear only in the argument 

types of a. Now suppose that {c ti ... t,„) and (c Si ... s„) are two terms that have 
the same type j3' and let ai and a2 be the type of c in these two terms. Obviously, ai 
and a2 are generated by applying substitutions to a. We assume that any variables 
appearing in the ranges of these substitutions are fresh, i.e., they have not been used 
previously in the computation. Let 

h = m,rl)\l<i<l} and h = m,rf)\l < t < 1} 

be the restrictions of these respective substitutions to the variables appearing only 
in the argument types of a. Then ai and a2, the types of c in the two terms, are 
unifiable by a substitution 9 if and only if 0{r-]) = 0{r-f) for 1 < i < I. Moreover, any 
6 satisfying this property makes the types ofti and Si identical for 1 <i <n. 

Proof. Any substitution 6 that unifies ai and 0:2 makes the argument types of c in 
the two terms identical. This is the same as saying that the types of the arguments 
of c must be identical under the substitution. Thus, it only remains to show that 9 
unifies ai and 0:2 if and only if the condition mentioned in the theorem is satisfied. 

Restricting attention to only the variables appearing in a, the substitutions that 
produce «i and a2 from a can be partitioned into substitutions for the variables 
{Ui, . . . , Uk} and the substitutions 0i and 02 respectively. Moreover, since the target 
types of a\ and a2 are identical, the former substitution can be assumed to be the 
same in both cases. Let us take it to be (f). By assumption, the domains of 0i and 
02 do not contain any variables in the range of (f). Thus, we may write and 0:2 as 
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0i(0(q;)) and 4>2{4>{(^)), respectively. Now, for any unifier 9 of ai and a2 we have the 
following: 

^ ^(0i(0(a))) = ^(02(0(«))) 
^ (^o0i)(0(«)) = (^o02)(0(a)) 

Since the range of 4> docs not contain Vi, . . . , VJ, it is easy to see that the last condition 
holds if and only if 6* o </)i(V^) = ^ o for 1 < i < L But this clearly holds if and 

only if 0{rl) = 0{r^) for 1 < i < L □ 

The ideas we have described may be applied to the append program appearing in 
Section 2.4. In the type skeleton of the predicate constant append, (list A) ^ (list 
A) ^ (list A) ^ 0, the type variable A appears in the argument types but not in the 
target. For this reason, the binding of A should be associated with the occurrences 
of append. We have already seen that type annotations are dropped from and nil. 
Thus the definition of append is viewed as the following in our implementation. 

append [A] nil L L. 

append [A] {X :: LI ) L2 {X :: L3) :- append [A] LI L2 L3. 

Correspondingly, a query of form (append (1 :: nil) (2 :: nil) L) becomes 

append [int] (1 :: nil) (2 :: nil) L. 

The final point to be noticed with regard to our type annotation scheme is that 
it is capable also of dealing with the situations where the type preservation property 
is violated. For example, consider a representation of heterogenous list base on the 
constants null and cons declared as the following. 

kind 1st type. 

type null 1st. 

type cons A ^ 1st ^ 1st. 

The list containing the integer 1 and the string "list" as its elements would then be 
represented by the term 

(cons [int] 1 (cons [string] "list" null)). 

Further, the unification of this term with another term representing a list would 
naturally involve unifying the type arguments of cons which, by Theorem 7.2.1, would 
achieve the effect of checking that the relevant occurrences of cons actually are (or 
can be made) identical. 
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7.3 Reducing Type Annotations with Clauses 

None of the type variables appearing in the type of a predicate constant can appear 
in its target type since this type is o. Thus it is not possible to use the ideas in 
the previous section to drop the annotation corresponding to any of these variables. 
Despite this, it can be observed that the bindings for some of the variables appearing 
in the heads of clauses defining certain predicates cannot have any impact on the 
computation. As a particular example, consider the predicate append, an annotated 
version of whose definition was presented at the end of the last section. Since the 
annotation does not refine the declared type of append in either of these clauses, the 
particular type of append in any well-formed goal that has this predicate as its head 
will not be the cause for failure in head unification. Moreover, the instantiation of 
this variable only gets used in the annotation of a recursive call to append where, by 
the same analysis, it again cannot cause failure in unification. Thus, if we maintain 
an annotation for this type variable with the clauses for append, we would be creating 
a possibly complex type term only for the purpose of passing it on from recursive call 
to recursive call. 

To eliminate the redundant type associations with clause definitions, we describe 
in this section a systematic process for determining the elements of the types list 
associated with a predicate name that could potentially infiuence a computation. For 
the types not in this list we can conclude that they can be elided. 

The process of determining the potentially "needed" elements in the types list 
is organized around the full set of clauses defining the predicate constant, including 
those contained by augment goals. If the definition of a predicate can be dynamically 
extended, i.e., if there are clauses for the predicate embedded in augment goals, we 
assume every element in the types list of the predicate is needed: specific bindings 
for type variables appearing in the embedded clause might be determined when the 
enclosing clause is used in a backchaining step, and then these types will be needed 
in determining the applicability of the clause. For a predicate all of whose clauses 
appear only at the top-level, our analysis can be more sophisticated. An element 
in the types list of the predicate being defined is needed if the value in the relevant 
position in the list associated with the particular predicate constant occurrence at 
the clause head is anything other than a variable: unification over this element must 
be attempted during clause selection since it has the possibility of failing in this case. 
Another situation in which the element is needed is if it is a type variable that occurs 
elsewhere in the same types list or in the type lists associated with a non-predicate 
constant that occurs in the clause. The rationale here is that either the variable will 
already have a binding that must be tested against an incoming type or a value must 
be extracted into it that is used later in a unification computation of consequence. A 
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more subtle situation for the variable case is when it occurs in the types list associated 
with the predicate head of a clause contained by an augment goal in the body. In 
this case the binding that is extracted at runtime in the variable has an impact on 
the applicability of the clause that is added and consequently is a needed one. 

The only case that remains to be considered is that where a variable element in the 
types list for the clause head appears also in the types list associated with a predicate 
constant in a goal position in the body, either at the top-level or, recursively, in an 
embedded clause definition. It can be observed that a precise neededness information 
for the head predicate can be determined only after those of the body predicates are 
available. For this reason, our analysis in this case first determines the neededness 
information for the predicate constants appearing at the heads of goals in the body 
and then uses this information in the analysis for the predicate that is being defined 
by the clause. As an example of how this might work, consider the following program 
annotated in the style of Section 7.2. 

type print A ^ o. 

type print Jist (list A) ^ o. 

print [int] X :- { code for printing the integer value bound to X} . 
print [string] X :- { code for printing the string value bound to X} . 

printlist [A] nil. 

printlist [A] (X::L) :- print [A] X, printlist [A] L. 

In this code, print is a predicate that is defined to be polymorphic in an ad hoc way 
and consequently has genuine use for its type argument. This information can be used 
to determine that it needs its type adornment and the following analysis exposes the 
fact that printlist must therefore carry its type annotation. 

The approach suggested above needs refinement to be applicable to a context 
where dependencies between definitions can be iterated and even recursive; at present, 
it doesn't apply directly even to the definition of append. The solution is to use an 
iterative, fixed-point computation that has as its starting point the neededness infor- 
mation gathered by initially ignoring predicate constants appearing in goal positions 
in the body of the clause. In effecting this calculation relative to a given program V, 
we employ a two-dimensional global boolean array called needed whose first index, p, 
ranges over the set of predicate constants appearing in V and whose second index, 
i, is a positive integer that ranges over the length of the types list for p; this array 
evidently has a variable size along its second dimension. The intention is that if, at 
the end of the computation, needed\p\\i\ is false then the ith element in the types list 
associated with p does not have an influence on the solution of any goal G from V. 
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find-needed (V ) { 
iniLneededCP); 
repeat 

for each top-level non-atomic clause C in elabiV) 

process_clause(C ); 
until (the value of needed does not change) 



iniLneeded(P) { 

for every embedded clause C in elabiV) with {jp [n 
for 1 < * < A; 

needed[p][i] = true 
for every top-level clause C in elab{V) with {p [ti, . 
for 1 < i < fc 

if Ti is not a type variable 
needed[p][i] = true; 

else 

if {{Ti occurs in tj for some j such that 1 < j < A: and i ^ j) or 
{n occurs in the types list of a non-predicate constant in C) or 

{Ti occurs in the types list of a predicate constant appearing 
as the head of an embedded clause in the body of C)) 
needed[p][i] = true; 

} 



, . . . , Tk] ti ... tn) as head 
. . ,Tk] ti ... tn) as head 



Figure 7.2: The top-level control for determining if a predicate type argument is 
needed. 
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procesS-clause(C) { 

let C be of the form (p [n,. . . , Tk] ti . . . tn ■ — G). 
for 1 < i < fc 

if needed[p][i] is false 

needed[p][i] = processJ)ody(G, Ti)}; 

} 

process-body(G, t) : boolean { 

switch on the top-level structure of G: 
VG', 3G': return procesS-body(G' , t); 

Gi A G2, Gi V G2- return {processJ)ody(Gi, t) or processJ)ody(G2, t)); 
D D G: return {process-body(G , t) or procesS-embedded-body(D, t)); 
A of the form {q [ai,...,ai] si ...Sm)- 

if T occurs in ai for some i such that 1 < i < I and needed[q][i] is true 
return true; 

else 

return false; 

} 

procesS-embedded-body(D, t) : boolean { 
switch on the top-level structure of D: 

VZ?i: return procesS-embedded-body(D I, t); 

Di /\ D2: return process-embedded_body(Di, t) or procesS-embeddedJ)ody(D2, t); 
G D A: return process-body(G, t)); 
A: return false; 

} 

Figure 7.3: The clause processing for determining if a predicate type argument is 
needed. 
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We compute the value of this array by initially setting all the elements of needed to 
false and then calling the procedure find^needed defined in Figure 7.2 and Figure 7.3 
on the program V. 

There are only finitely many elements in the needed matrix for any program V 
and, from this, it is clear that the invocation of find_needed must always terminate. 
Theorem 7.3.1 below shows that, when it docs terminate, it provides us a conservative 
estimate of the type annotations that have a role to play in computation. Using this 
theorem, we see that we can correctly eliminate those type variable locations from 
clause and goal heads that are determined not to be needed for any given predicate 
by this procedure. 

Theorem 7.3.1. Letp be a predicate constant defined in V and let it be the case that 

when find -needed{V) terminates, needed[p][i] is set to false. Then the ith element in 
the types list of p has no impact on the solvability of any goal G from V . 

Proof. We shall prove the contrapositive form of the theorem: if the solvability of 
G from V is dependent on the ith element of the types list of a predicate p, then 
neededfp] [i] must be set to true by find. needed {V). 

From an examination of Definitions 4.2.4 and 4.2.5, it can be seen that the ith 
element of the types list of p affects the computation resulting from G relative to 
V only if there is a sequence of atomic formulas of the form Ai, . . . , An with Ai 
having the predicate p as its head and there is a sequence D2, ■ ■ ■ , D.^ of clauses in 
the elaboration of V augmented with type instances of embedded clauses in V and a 
sequence of positive numbers ji , . . . , jn such that 

1. for 1 < i < n, Ai-i is an instance of the head of Di and A^ appears as a goal in 
the body of that instance of A, 

2. for 1 < i < n — 1, the jjth type argument in the head of Di is a variable and, 
further, it appears in the jj+ith type argument of the goal in the body of Di 
that has Ai as its instance, 

3. ji = i, and 

4. the jnth type argument of An directly affects computation either because it has 
to be unified with a non-variable type argument in the head of Dn or because 
its value imposes a structure requirement on some other type argument of the 
head or on the type of an embedded clause or of a constant appearing in a place 
different from the head of an atomic goal in the body. 

Letting p = pi, . . . ,p„ be the predicate heads of the goals in the sequence Ai, . . . , 
we claim that find^needed will result in needed [pi][ji] being annotated to true for 
1 < i < n. The desired conclusion follows from this. 
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We prove the claim by a backwards induction on the sequence. 

For the base case, an inspection of the procedure iniLneeded shows that the pos- 
sibihties described for the j„ type argument impacting on the computation can arise 
only in the situations in which this procedure causes needed [pn] [jn] to be marked true; 
the only slightly tricky situation is that where is a type instance of an embedded 
clause but this is handled by noting that needed[pn][k] is marked true for all k in 
this case. Noting that once an entry in the needed matrix has been marked true, this 
marking persists through the rest of the computation of find^needed then concludes 
the argument. 

Assume now that the claim is true for the sequence pk+i, ■ ■ ■ ,Pn- This means 
in particular that needed[pk+i][jk+i] must be marked true. If is an instance of a 
clause in elab{V), then an inspection of the procedures process.clause and processJjody 
shows that neededfpkj [jk J ^'Vist also be marked true during some iteration of the loop 
in find^needed. If Ak is an instance of a type instance of an embedded clause on 
the other hand, then init_needed will mark needed [pk][jk] ti^ue as a special case of 
marking needed[pk][l] true for all I. Since a true annotation persists in the computation 
of find.needed, the claim follows for the sequence pk,..-,Pn, thus completing the 
inductive argument. 

□ 

As a particular example of the use of this theorem, we observe that the type list 
argument for the version of append shown in the last section can be eliminated, thus 
reducing the definition of this predicate that needs to be used at runtime to what is 
essentially the untyped form. More generally, if every type argument for the head 
predicate of a clause is a variable — a property called type generality in [23] — and every 
constant is type preserving and there are no embedded clauses, then types can be 
eliminated entirely during computation. 

7.4 Low-Level Support for Types and their Compilation 

We now can consider the integration of the runtime processing of types into our 
abstract machine based on our annotation scheme. 

The first issue to be solved is the low-level representations of types. As already 
mentioned, the types in the XProlog language can be essentially viewed as first-order 
terms. This allows us to use the usual encoding of first-order terms in the WAM 
for types in XProlog. In particular, a memory cell is used for each type with a tag 
indicating its category as one of type variable, type constant and type structure. For 
a type variable, the category tag is the only important information to be maintained. 
For a type constant, a reference to its descriptor is kept along with the tag. The 
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additional information with a type structure consists of a reference to a sequence of 
cells in which the first corresponds to the type constructor of a fixed arity and the 
subsequent ones, in the number given by the arity, to the arguments. 

The association of types with (term) constants is realized as the following. A new 
class of constants is introduced to the term representation described in Section 5.2 as 
those with runtime type annotations. The only extra information maintained with a 
constant of this sort is a reference to a type environment that contains the elements 
in the types list of the constant decided by the compiler in the way described in 
Section 7.2. The size of this type environment is stored along with the constant 
descriptor. 

The usages of the data areas of our abstract machine are also extended. First, the 
heap and the stack are used to store types in addition to terms. Second, the bindings 
of type variables are also trailed whenever it is necessary to do so. Further, the 
PDL is also used in the course of type unifications invoked in an interpretive mode. 
Finally, the data registers Ai to An can be used to refer to a type, and an additional 
register TS — similar to the register S for terms — is used for the decomposition of type 
structures. 

Compilation treatment of type unification is also provided by our implementation. 
Essentially, such computation can be encountered in the following two situations. 
First, it can be the result of unifying the types list of a predicate constant appearing 
as a clause head with the types appearing appearing in an actual goal. Second, it 
could be required during term unification when the types of two occurrences of the a 
constant of the same name have to be checked for compatibihty. In both cases, the 
elements in the types list are viewed as additional arguments of the given constant 
and arc handled by the conventional get and unify instructions respectively. 

We consider the compilation of the definition of printlist provided in the previ- 
ous section to illustrate the use of type unification instructions to handle the types 
argument of a predicate constant. The instructions generated for the clause 

printlist [A] (X::L) :- print [A] X, printlist [A] L. 

take the following structure. 



allocate 

getJype.variable 
getJist 

unify. variable 
unify -Variable 
finish-unify 
putJype^value 
calLname 



Yl, 
2, 



3 

Yl, 
Al 
Al 
Y2 



A2 
print 



A2 



% 
% 
% 
% 



% 
% 



A2 = A 
print Al A2 



Yl 
Al 



A2 ^ A 



(■■■■ 



X 



L) 



(Al = X) 
(Y2 = L) 
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putJype-value 
put-value 
deallocate 
execute.name 



Yl, A2 
Y2, Al 



% A2 = A 
% Al =L 



printJist 



% printJist Al A2 



The instructions getJype-variable, put-type-variable and putJype-value used liere cor- 
respond to the get-variable, put^variahle and put^value instructions of the WAM. Prom 
the compiled form, it should be evident that the variable A in the types lists of print 
and printJist is treated as an additional argument of these predicate constants. 

To deal with the situation where it is necessary to compile the matching with a 
constant that has a non-empty types list associated with it, new instructions are intro- 
duced to transit from term unification to type unification. One of these instructions 
is 



that is a variant of the get_structure instruction that is used for compiling a first-order 
application term whose constant head has type associations. The action underlying 
this instruction differs from its "untyped" version in the manipulation of the constant 
head given by /. If it is in the situation where / should be created on the heap, a 
typed constant cell is constructed with an empty type environment and set to be 
referred to by the register TS with the assumption that this type environment will 
be filled in by the execution of the subsequent unify Jype instructions in the WRITE 
mode. Alternatively, if the term referred to by Ai is a first-order application of head 
/, the TS register is set to refer to its type environment, and it is assumed that the 
actual unification against the types in the environment will be carried out by the 
following unify^type instructions executed in the READ mode. 

For a concrete example, assume we have a kind pr corresponding to the set of 
tuple types. Further, assume the constants pair and first are used to denote func- 
tions returning a pair consisting of the given two arguments and returning the first 
argument of the given pair respectively. 



Then the compilation of the term (first [B] (pair X ¥)) appearing in a clause head 
results in the following sequence of instructions: 



getJyped-structure Ai, fi n 



kind 
type 
type 



pr 

pair 
first 



type — > type — > type. 
A^ B ^ (pr A B). 
(pr A B) ^ A. 



getJyped-structure 
unify Jype^variable 
unify. variable 



Al, first, 1 

A2 
A3 



% Al ^ (first 

% 
% 



[B] 
A3) 
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get.structure 
unify ^variable 
unify ^variable 



A3, 

A4 

A5 



pair, 2 % A3 — (pair 
% 
% 



X 



Y) 



The instruction unify Jype^variable used above corresponds to the unify ^variable in- 
struction in the WAM. 

Typed variants of the get.constant and unify. constant instructions are also in- 
cluded. These are specifically the following: 



As in the case of get_typed_structure, when the constant c is created by these instruc- 
tions, a typed constant cell associated by an empty type environment referred to by 
TS is constructed. However, in the situation when the term referred to by Ai is a 
constant of the same name, the elements in the types lists of the two instances of c 
must already be identical, and so unifications over them can be safely elided. For 
this purpose, an additional argument L is used in these instructions to indicate the 
address of the instruction immediately following those for constructing the types list 
of c, so that execution can jump to the location L in the described situation. 

Additions are also made in the put and set classes of instructions to support the 
creation of typed constants in a similar manner to that in the get and unify classes 
described above. Specifically, the new instructions 

putjtyped.constant Ai, c and setjtyped.constant c 

are added. Moreover, since the put and set instructions for term creation could 
interleave with those in the get and unify classes for the purpose of solving the higher- 
order part of unification in an interpretive manner, the usages of putJ,ype and setjtype 
instructions are also extended to a clause head. 

The last issue to be clarified with regard to types is aboTit the treatment of the 
types argument of a constant when it is used both as predicate and non-predicate in a 
program: when appearing as a predicate, the types argument of the constant may be 
further reduced, making the number of types argument of such an occurrence of the 
constant inconsistent with that of its non-predicate occurrence. This phenomenon 
can be illustrated by the following example, which defines the meta-level application 
of binary functions. 

type apply (A^A^A^o)^A^A^A^o. 
apply P Argl Arg2 Result : — P Argl Arg2 Result. 

Using append defined before as the "function" that is to be applied, the following 
query can be asked 



getJyped.constant Ai, c, L 



and 



unify Jyped.contant c, L. 
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?- apply (append [A]) (1 :: nil) (2 :: nil) R. 

Note that the occurrence of append in the above query should be associated with the 
type variable A based on our type annotation scheme. The computation of this query 
requires the solution of 

solve (append [A] (1 :: nil) (2 :: nil) R), 

in the course of which the usage of append is transformed into the head of a goal, and 
is decided by the compiler as one without type annotations. 

To solve this problem, the types list of a predicate constant is carefully organized 
in our implementation in the way that those required by a predicate usage of this 
constant but not by a non-predicate usage should always appear before the others, 
and their lengths arc also recorded along with the descriptor of the constant. This 
information is then taken into account by solve in loading the arguments of the 
predicate constant into registers: the types that are not needed for the predicate 
usage of the constant are simply discarded. 

It is interesting to contrast the treatment of types we have described in this chap- 
ter with the one used in Version 1 of the Teyjus system. In the latter system, types 
have to be maintained not only with constants but also with logic variables; this is 
necessary because the types of such variables play a role in determining the struc- 
tures of bindings calculated in unification. Among the different ideas that we have 
described in this chapter for reducing runtime type computations, the only one that 
is applicable in that setting is the one based on separating a type into a skeleton and 
type environment part. This optimization is actually also employed by Version 1 of 
the Teyjus system. From an implementation standpoint, that system also provides 
a means for representing types and it includes suitable term and type unification in- 
structions to support the compilation of relevant type-related computations, creation 
and unification on them. At a detailed level, there is a difference between the repre- 
sentation used for function types in our setting and in Teyjus Version 1. In the latter 
context, it is important to be able to access the argument and target types quickly 
and to determine the number of arguments in the function type; these attributes are 
used in generating unifiers. To facilitate such an examination, function types are rep- 
resented in "un-curried" form, i.e., a type such as cti ^ . . . — > — > /3 is represented 
as a pair of a vector containing the types and the type (5. While this 

representation works well in most instances, is can occasionally cause problems. In 
particular, consider the situation when /9 is a type variable. In this case, it could be 
instantiated with a function type, thereby allowing the vector of arguments to become 
longer. Having to consider this possibility complicates the unification computation 
on types and also leads to several special instructions to facilitate the compilation 
of unification with function types. In our setting, types do not have a role to play 
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in term unification and hence it is not important to be able to see the arguments 
and target type of a function type in any special way. Moreover, we expect types 
themselves to be infrequently accessed and, when they are accessed we expect them 
to be even more infrequently complicated function types; the latter is especially true 
because there is never a need in our context to look at skeleton types whereas this is 
needed in the setting of Teyjus Version 1. Consequently, we have treated the function 
type constructor as just another binary function symbol with no special properties in 
our representation. This also has the benefit of further simplifying our already simple 
adaptation of the instruction set underlying type unification in Teyjus Version 1. 



Chapter 8 



An Implementation of AProlog 

We have, at this point, presented a complete picture of an abstract machine and 
compilation model that could underlie an implementation of XProlog. As part of 
this thesis, we have undertaken such an implementation. This implementation is 
referred to as Version 2 of the Teyjus system or Teyjus Version 2 for short. ^ There 
are three purposes for undertaking this implementation. First, we have wanted to 
provide researchers interested in experimenting with the specification and prototyping 
capabilities of XProlog a concrete and efficient vehicle to use in such endeavors. Teyjus 
Version 2 already serves this purpose by forming a suite together with the Ahella 
system [17] that is freely distributed by our research group to support specification, 
prototyping and reasoning about specifications [18, 19]. Second, we want to evahiate 
the design ideas that we have developed and for this an actual implementation is 
essential. Finally, we believe that there arc several language related issues that can be 
experimented with relative to XProlog and having a concrete implementation provides 
the means to do this in a more comprehensive fashion. 

In this chapter, provide a high-level description of Teyjus Version 2. The partic- 
ular motivations for building this system have imposed additional conditions on its 
structure. For example, the need to make it widely accessible has meant that we pay 
special attention to its portability to different architectures and operating systems. 
Similarly, if Teyjus Version 2 is to be useful for evaluation and language extension 
experiments, then it must have an open and easy to modify structure as a software 
system. Our discussion below highlights the impact of such considerations in the 
overall system that we have constructed. 

8.1 The Language Implemented 

The XProlog language also encompasses a notion of modularity for organizing large 
programs. The support of this feature is orthogonal to the issues considered by this 



^ As is typical of a software project of significant size, Teyjus Version 2 has involved contributions 
from others. However, the underlying implementation ideas for all parts except the treatment of 
modularity notions in XProlog have derived from this thesis and the bulk of the compiler and the 
abstract machine emulator is also attributable to it. 
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thesis, but a brief discussion of it is nevertheless to providing a proper description of 

Teyjus Version 2. 

The notion of module underlying \ Prolog permits the space of names and pred- 
icate definitions to be decomposed into smaller units. The interface of each such 
unit is provided by a signature, which includes the names, i.e., type constructors 
and constants, that are publicly visible. The implementation of this interface con- 
stitutes an accompanying module, that comprises the predicate definitions as well 
as the declarations of the global and local names needed in the module. An impor- 
tant interaction between XProlog program units takes place through the medium of 
module or signature accumulation that allows the set of names and the definitions 
of predicates available in a particular unit to be extended by using the declarations 
in another unit. The meaning of this construct can be understood as inlining the 
contents of the accumulated signature or module at the place of its occurrence, but 
only after affecting a renaming of non-global names to avoid inadvertent and illegal 
confusion. 

As a concrete example, we can examine how the program prenex introduced in 
Section 2.3 can be organized into different modules. A conceptual consideration of 
the problem to be solved leads naturally to the following four components: 

1. a general framework for representing first-order logics, i.e., one that identifies 
the term and formula categories of expressions and that defines the logic con- 
nectives and quantifiers under consideration; 

2. the specification of the vocabulary of particular versions of the logic, i.e., a 
component that identifies the sets of constant, function, and predicate symbols 
of interest; 

3. a specification of syntactic properties of first-order formulas, such as quantifier- 
freeness, that are of general use in addition to being useful in defining the 
prenexing transformation; and 

4. a specification of the particular transformations for calculating a prenex normal 
form of a given formula. 

These logical components can be mapped into the specification of the four sig- 
nature with names logicJjase, logic.vocah, syntax_properties, and pnf and the module 
with name pn/ shown in Figure 8.1. The reading of the displayed program should be 
based on a understanding of new syntactic constructs in the following way. First, the 
key word sig or module followed by a name indicates the start of the specification of 
the signature or module, respectively. Next, the accumulation of a signature is de- 
noted by using accum.sig followed by the name of the signature, whereas accumulate 



106 



sig logic-base. 

kind term, form type. 

% Followed by the declarations for other logical connectives and quantifiers. 



sig logic-vocab. 
accumsig logic.base. 

% Followed by the declarations for the constants, functions and predicates in the logic. 



sig syntax-properties. 

accumsig logicJiase. 

exportdef quantifier _free, is_atomic form — > o. 

exportdef isJerm term a. 



module syntax-properties. 

% Followed by the definitions of quantifier -free , is.atomic and is-term. 



sig pnf 

accumsig logic-base, logic-vocab. 

exportdef prenex form form — > o. 

useonly quantifier -free, is -atomic form 

useonly is -term term — > o. 



o. 



module pnf 

accumulate syntax-properties. 

accumsig logicJjase, logic-vocab. 

type m,erge form — > form — > o. 

% Followed by the definitions of prenex and merge. 



Figure 8.1: A module based organization of quantifier -free. 
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is used to indicate that of the specification of the module with the name following the 
keyword. Finally, exportdef and useonly combine a type declaration with a "bound- 
ary" description for predicate definitions: the former indicates that all the definitions 
of a predicate are contained by this module, and it is illegal to extend them in any 
context into which this module is accumulated; the latter is a directive that com- 
plements the former by specifying that the module corresponding to the signature 
in which it appears (or, more directly, the module in which it appears) may use the 
predicate identified but guarantees not to extend its definition. 

8.2 Structure of the Implementation 

The abstract machine is realized in our implementation through a software emulator. 
Thus, the overall software system has at least two components: a compiler and an 
emulator. We have also chosen to channel the interaction between the compiler and 
the emulator through a bytecode file that is written to and read from memory. The 
support of reading this file into the emulator so as to set the emulator in a state where 
it is ready to respond to user provided queries is realized by a third system called a 
loader. 

An important issue to consider is what constitutes the appropriate unit for com- 
pilation. One simple possibility, in the context of the module system described in 
the previous section, is for the compiler to inline all the accumulated signatures and 
modules directly into the module being processed and to produce a bytecode file from 
this (large) collection. This is, in fact, the approach used in Version 1 of the Teyjus 
system. However, this approach does not provide true support of modularity, partic- 
ular aspects of which are the ability to compile and test modules separately and to 
reuse the results of compilation of common modules in different systems. In light of 
this fact, Teyjus Version 2 supports the ability to compile component modules sep- 
arately and to realize the combination inherent in accumulation through a separate 
linking phase. Consequently, the overall system includes a fourth component. This 
is a linker that has the task of looking at a collection of (partial) bytecode files and 
producing from this one complete bytecode file based on the relevant accumulation 
information also contained in the starting files. 

Separate compilation generally introduces difficulties in performing global com- 
piler optimizations because the visibility of code is limited. In our context, at least 
one of the optimizations that is directly impacted is the reduction of runtime type 
associations with predicate occurrences at the heads of clauses and at the heads of 
goals: the analysis discussed in Section 7.3 for this purpose requires knowledge of the 
the complete set of defining clauses for relevant predicates, but this is not possible to 
have if the definition could be extended by the code in an accumulated module that 
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is not being looked at during compilation of the parent module. However, the export- 
def annotation discussed in the previous section provides a partial solution here. In 
particular this annotation tells the compiler that the complete set is in fact available 
in relevant cases so that it can still perform the optimization in question. 

The primary function of the compiler is to translate \ Prolog modules into bytecode 
form. However, it has the capability to examine XProlog syntax relative to the name 
declarations contained in a module and this functionality is useful in one more place: 
in parsing user queries. Conceptually this process works in the following way in 
Teyjus Version 2. When requested to set up for queries against the declarations in 
a particular module, the top-level interface invokes the loader to prime the emulator 
with the declarations in that module. Simultaneously, the loader creates relevant 
symbol tables for the compiler to use in parsing queries relative to the vocabulary 
provided by the module. Once the loading is complete, an interaction mode is entered. 
In this mode, each time a user provides a query, the compiler is invoked to parse 
it. The resulting structure is then returned to the top-level system which wraps 
it within the solve predicate described in Section 6.5 and then passes this along to 
the emulator which proceeds to solve it. A fine point to note about this scheme is 
that it means that top-level queries are treated in an interpreted manner. It is also 
possible to compile the structures resulting from parsing queries into bytecode form. 
A realization along these lines actually has advantages over the interpretation based 
one but its development is left to future work. 

We conclude this section with a discussion of two considerations that have im- 
pacted the form of the actual implementation. 

The first consideration is that we have wanted an implementation that is easy to 
read and modify. This means that it is best to use a genuinely high-level language — 
such as a functional or a logic programming language — wherever this choice does 
not impact adversely on efficiency. This condition holds for all those parts of the 
system in which closeness to the underlying machine architecture does not dictate the 
quality of performance. Specific parts that satisfy this requirement are the compiler 
and the top-level interface. These components have therefore been developed in the 
functional language OCaml. On the other hand, the efficiency of the emulator does 
depend on having access to aspects of the machine architecture. For this reason the 
language C has been chosen for implementing this component.^ The decision to use 
different languages for different components brings certain complexities to the overall 
implementation. For example, the top-level interface has to rely on the functionality 
of both the compiler and the emulator and hence language inter-operability is a 



^Thc linker and loader might well have been implemented in OCaml but they have in fact been 
implemented in C. 
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concern. Similarly, knowledge of aspects such as the set of machine instructions 
needs to be shared between the compiler and the emulator and such sharing should 
be explicit for the ease of modification. We discuss the way in which we have dealt 
with such complexities in Section 8.4. 

The second consideration is the portability of our system to different actual ma- 
chine architectures. Althoiigh the OCaml implementation naturally relieves this bur- 
den from the compiler development, special attention is still needed on the C based 
realization of the emulator to meet this goal: the low-level data structures should be 
designed in a way that is not particularized to any actual machine architecture. This 
topic is discussed in details in Section 8.3. 

An interesting statistic is the sizes of the different components of our system. 
The compiler comprise roughly 20,000 lines of OCaml code whereas the emulator, 
the hnker and the loader comprise about 26,000, 4,500 and 2,000 lines of C code, 
respectively. 

8.3 Term Representation and Portability 

Portability is an important property of our system, the consideration of which di- 
rectly affects the design of the C based emulator, in particular the realization of term 
and type representations introduced in Section 5.2 and Section 7.4 respectively. An 
conventional C approach to realizing such encodings is to give explicit control over 
the layout of the corresponding memory units by specifying bit patterns within a 
word. For example, in Version 1 of the Teyjus system that assumes that words are 
32-bits long, the higher-end 4 bits of a word are used to record the category tags of 
terms, additional numeric properties such as the universe indexes of logical variables 
and constants are encoded by 10 bits, and the addresses of subterms take the lower 28 
bits of a word. However, the hard-coded bit patterns make the implementation heav- 
ily depend on the underlying machine architecture: Teyjus Version 1, for instance, 
cannot run on 64-bit machines. 

A natural way to eliminate this sort of hardware dependency is to use a high- 
level data structure provided by the implementation language to fulfill the encoding 
task, so that the decision of actual machine memory layout can be decided by the 
underlying compiler. In the context of C, structures are an encoding facility of this 
sort. Based on the understanding of the alignment rules of C compiler, the structure 
types corresponding to terms and types can be designed into a form from which the 
actual memory deployment closely resemble that of the bit pattern method. For 
instance, a field of unsigned 8 bit integer type can be used to encode the category 
tag of terms, and by positioning this field as the first in the structure declarations, 
the first 8 bits of an encoded term can be controlled to always contain the category 
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information; fields of suitable types can be used for the additional information of each 
term category and among them, addresses can be directly encoded as C pointers; 
finally, a generic term can be used to control the minimum size of terms so that they 
are always aligned to the word boundary of the underlying machine architecture, as 
well as to indicate the position of the category tag. The above discussion can be 
visualized through the declarations and the corresponding space allocations shown in 
Figure 8.2. 

The utilization of structures in C for data encodings eliminates the dependency 
from our system on the word lengths of actual machine architectures. However, this 
method may have undesired impacts on the performance of the emulator. First of all, 
it can be observed that the alignment of structure fields carried out by C compilers 
can potentially result in gaps between useful information within a word and makes 
the encoding less compacted compared with the bit-pattern based one. Second, the 
recognition and decomposition of terms now have more overhead: as opposed to 
simple bitwise operations, these computations now require access to structure fields, 
which thereby obtain more complicated formation and consume more CPU cycles. 

The structure based approach is adopted in the realization of data encoding in 
Teyjus Version 2. This approach has made our system portable to different machine 
architectures, but could potentially incur additional performance costs. Based on 
the primary usage of our system, which is to serve as an experimental framework 
for assessing the efficacy of implementation ideas of XProlog, we argue that system 
portability is a more important concern compared with the possible efficiency im- 
provement that can be obtained from code tuning at the software development level. 
Moreover, it should also be observed that the conceptual design of term and type 
representations in our abstract machine does not prohibit the bit pattern approach. 
When the system is used in a performance critical context, this approach can still be 
adopted to hard- wire the system to a particular machine architecture. In our software 
implementation, the representations of data are encapsulated into a separate module. 
The adjustments needed for changing their actual realization is thus limited to this 
module and can be made without affecting its interface and usage. 

8.4 Issues Related to Multiple Implementation Languages 

As discussed in Section 8.2, driven by the flexibility requirement, our compiler is 
realized in a high-level language that differs from the one chosen for the other sys- 
tem components. This discrepancy, however, poses implementation challenges with 
regard to realizing the communication between the compiler and the emulator and 
maintaining the integrity of the software. Discussions in this section are focused on 
these difficulties and our solutions to them. 
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typedef uintS T_TAG; 

typedef uintl6 T_UNIVIND; 

typedef uintl6 T_ARITY; 

typedef uint32 T_CSTIND; 



// type of category tag 

// type of universe index 

// type of application arity 

// type of constant table index 



typedef struct { 
T_TAG tag; 
void* placeHolder; 

} t_term; 



63/31 56/24 




typedef struct { 
T_TAG tag; 
T_UNIVIND univind; 
T_CSTIND cstind; 

} t_const; 



typedef struct { 

T_TAG tag; 

T_ARITY arity; 

T_TERM* function; 

T_TERM* args ; 
} T_APPL I CATION; 



64 bits: 

63 56 47 



32 bits: 

31 24 



32 31 
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tag 
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Figure 8.2: Examples of data layout on actual machine architectures. 
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The interaction between the compiler and emulator can occur in two ways. First, 
the compilation result of a program has to be eventually interpreted by the emulator. 
This sort of communication is carried out indirectly through bytecode files and is 
consequently not affected by the particular language choices of the system compo- 
nents. However, a direct interaction between the compiler and the emulator is needed 
for handling top-level queries as discussed in Section 8.2. Specifically, the runtime 
execution should pass from the emulator to the compiler once a query is asked at 
the top-level; after performing necessary parsing work, the compiler should pass the 
result back and let the emulator take over the control again. The representation of 
the query differs in the settings in which it is needed — it is denoted as an abstract 
syntax tree during compilation and should be characterized by the low-level abstract 
machine data encoding in the emulator — and consequently requires a translation from 
the former to the latter. A difficulty is then introduced in realizing this process by 
the choice of different implementation languages for the compiler and emulator: the 
translation has to be carried across the language boundary between OCaml and C. 

One way to solve the above problem is to take advantage of the capability OCaml 
has of directly manipulating the memory of C: with an understanding on the em- 
ulator's data representation, the compiler can take the full control of constructing 
the relevant terms and types on the emulator's heap. However, a closer examination 
reveals that this choice is not desirable. First, from the perspective of modularity, 
this method unnecessarily couples the implementation of the compiler with that of 
the emulator by an agreement on the format of the emulator's data representation. 
Second, it also complicates the actual software implementation by requiring special 
effort to protect the segment of memory that the compiler writes to from the garbage 
collector for OCaml. For these reasons, an alternative approach is used in our imple- 
mentation. Under this scheme, the task of constructing an emulator term is separated 
into smaller steps that are carried out both by the compiler and the emulator: the 
compiler is responsible to provide a basic guidance on term creation with simple in- 
formation such as the term's category and additional numeric properties, for instance 
the universe index; the actual deployment of the term into the emulator's memory 
and the setting up of references to subcomponents in the graphical representation of 
the term is locally maintained by the emulator. Specifically, for each kind of term, an 
OCaml function is implemented that invokes a corresponding term creation routine 
of the emulator (in C). The parameter passing between these functions is limited to 
data of simple types such as integers. By recursing through the abstract syntax rep- 
resentation of the term from the top-level, the compiler issues term creation requests 
for each subterm through the described OCaml functions, which eventually dispatch 
to the emulator's term construction routines. When invoked, the emulator's term 
construction functions make the decision on the the format of the subterm being 
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created and connect it to its parent according to the location information internally 
maintained on a temporary stack. The actual realization of the described scheme 
is based on the foreign language interface provided by OCaml. Invocations between 
OCaml and C functions in both directions are used. 

In addition to the interaction issue discussed above, the choice of multiple imple- 
mentation languages causes another problem with regard to maintaining the integrity 
of our software realization. In particular, the problem arises in the encoding of con- 
cepts that should be commonly aware by the compiler and other parts of the system. 
An example of this sort is the abstract machine instructions, which are pervasive 
to all the system components: they are generated by the compiler, processed by the 
linker and loader and eventually interpreted by the emulator. Consequently, a format 
for their encoding should be agreed on by the entire system. Specific information of 
this sort include the op-code, the number of arguments and the representations of 
each kind of argument, such as the register numbers, the environment frame offsets 
and the references to other instructions. The shared view on such data naturally 
requires two versions of encoding on them, which, of course, can be simply hard 
coded in OCaml and C respectively. However, the duplication of information that is 
conceptually the same introduces undesirable costs in maintaining their consistency 
through modifications, which could be frequently required in the course of exploiting 
new design ideas of our language. To avoid this cost, an approach based on automatic 
code generation is adopted in our implementation. Specifically, a simple high-level 
language is designed for the specification of the conceptual format of instructions with 
constructs that can be used to describe the relevant properties of interest. A trans- 
lator is then provided, which parses a file written in this language and automatically 
generates corresponding OCaml and C source code at the time that the system is 
installed. As a result, any addition or modification of the set of instructions or their 
internal structures can be made uniformly in the specification file and the overhead 
of ensuring consistency between the OCaml and C versions of encoding is eliminated 
from the software developers. 

The issue discussed above is also pertinent to the encoding of built-in constants 
(such as the set of logical constants) and type constructors. Information about these 
constants such as the names, arity, and types has to be known both to the compiler 
(for the purpose of parsing and code generation) and to the emulator. A similar 
translation approach has been adopted in this context as well, thereby eliminates the 
replication of such information. 
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Evaluating the Design 

Our focus in this chapter is on assessing the benefits of the ideas we have described 
thus far with regard to implementing AProlog. There is a quahtative aspect to the 
improvements these ideas bring about: they have considerably simplified the struc- 
ture of the abstract machine and have, in fact, made it possible to think of using this 
machine as the target of compilation for other higher-order logic based languages. 
However, the impact along this dimensions can only be gauged indirectly, through 
factors such as the relative ease with which the Teyjus Version 2 system has been 
developed, the extent to which this implementation is error-free and the uses that are 
eventually made of the abstract machine in implementing other related languages. A 
more direct and quantifiable effect of our ideas is on system performance. The avail- 
ability of two different implementations makes it possible for us to make comparisons 
and to thereby obtain an assessment as we do here. 

The key choice underlying this thesis is to orient an implementation of AProlog 
around higher-order pattern unification instead of using the more general procedure 
described by Huet. One effect of this choice is to reduce the role of types at runtime: 
these types are now only needed for checking the identity of constants that have the 
same name. We have also described ideas for reducing the amount of type informa- 
tion that has to be dynamically processed even further. One of our goals now is to 
understand the impact of these ideas on real programs. We have constructed Teyjus 
Version 2 so that we can turn on and off these type-oriented optimizations relatively 
easily. We describe a set of experiments and the conclusions we draw from doing this 
in this chapter. 

The most interesting aspect is, however, a head-to-head comparison with Teyjus 
Version 1 towards gaining an understanding of the impact on overall performance of 
the different choices. Some care is needed, however, in making such a comparison. 
Certain choices have been made in the implementation of Teyjus Version 2 that have 
the virtues of enhancing its portability and openness at the expense of performance. 
A balanced contrasting of the effect of the choice in unification procedure must factor 
out the impact of this auxihary decision. Towards this end we try first to assess the 
differences between the two systems over applications that do not call on higher-order 
unification and the mechanisms used to support this and then use this information to 
properly understand the differences on real higher-order applications of the language. 
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The rest of this chapter is structured as follows. In the first section, we describe 
experiments conducted towards understanding the impact of the choice we have made 
in low-level term representation. In Section 9.2 we study the benefits of the optimiza- 
tions in the treatment of types. Section 9.3 is devoted to a comparison of the two 
different versions of Teyjus on higher-order applications. Section 9.4 concludes the 
chapter with a summary of the results of our studies. 

Our study in this chapter is based on actual AProlog programs whose functionality 
and characteristics are described as relevant. The code for all these programs can be 
obtained from the Teyjus web site at http://code.google.eom/p/teyjus/. 

9.1 The Impact of Low-Level Term Representation 

The earlier version of the Teyjus system uses a highly optimized form of represen- 
tation for terms. In particular, that implementation assumes a 32 bit word and 
hard-codes the use of particular parts of such a word to encode specific components 
of the information contained in the term. This knowledge is then used to define bit 
patterns to extract the relevant information. Finally the use of these bit patterns is 
realized through macros in the C code implementing higher level functionality. While 
such a low-level encoding has performance benefits, it also has drawbacks at the level 
of portability. For example, Teyjus Version 1 can be run only on 32 bit architectures 
and hence cannot take benefit of newer, faster 64 bit machines that also have larger 
address spaces. As another example, since references are encoded using only a frag- 
ment of a 32 bit word, the system has to rely on special operating system capabilities 
for mapping the heap onto a specific segment of a larger memory area. A result of 
this is that the system cannot be ported to a platform that is running an operating 
system that does not provide such mapping capabilities. 

Portability has been a major concern within Teyjus Version 2. For this reason 
we have avoided bit patterns and have instead relied on using C based structures 
and a general understanding of how a typical C compiler maps such structures onto 
memory. This has also meant using a more expensive structure based decomposition 
in accessing relevant components of a term. Finally, to facilitate debugging and code 
clarity and modifiability, we have used function calls rather than macros to realize 
access to data fields. All of these choices impact on performance but none of them 
are essential to the fundamental issue of how we treat higher-order unification; our 
implementation has, in fact, been modularized so that our present choices concerning 
the low-level treatment of terms can be replaced by ones closer to those used in 
Teyjus Version 1 for fixed architectures. Thus to get a more accurate assessment of 
the performance impact of our main ideas, it is necessary to factor out the effect of 
this auxiliary aspect. 
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To assess the impact of the differences in low-level representations, a comparison 
was made of the performance of the two versions of the Teyjus system on a set of 
AProlog programs. Care had to be exercised in choosing the programs for this study. 
Obviously, these programs could not be ones that also exercised higher-order aspects 
of the language; it is impossible to separate out the differences arising out of term 
representation choices and those resulting from the treatment of high-order unifica- 
tion relative to such programs. However, first-order programs do provide a suitable 
means for the desired comparison. First-order unification obtains the same kind of 
compilation and interpretive treatments in the processing model underlying both of 
the systems. Moreover, it is a reasonable hypothesis that the low-level representation 
choices affect first-order and higher-order programs in a similar way. Another aspect 
that we wished to factor out is the result of optimizing the treatment of types in 
Teyjus Version 2. However, this was easier to do: we needed simply to turn off the 
type optimizations in the newer implementation. 

The programs that we chose to use for our study based on the above considerations 
are then the ones described below. 

Mono Naive Rev This program implements naive reverse on monomorphic lists 
that are represented using user-defined constructors. Specifically, a new sort i is 
identified, two new constants moons of type i — > {list i) — > {list i) and mnil of type 
list i arc defined, and the predicates rev of type {list i) — >■ {list i) ^ o and append 
of type {list i) {list i) {list i) ^ o are defined through the following set of 
clauses: 

rev mnil mnil. 

rev {means X LI) L2 : — 

rev LI L3, append L3 {mcons X mnil) L2. 

append mnil mnil. mnil. 

append {mcons X LI) L2 {means X L3) : — append LI L2 L3. 
The actual testing consisted of invoking rev 30,000 times on a collection of lists. 

Poly Naive Rev This program is a polymorphic version of the naive reverse de- 
scribed above. In particular, the types of the predicates rev and append in this 
instance are 

{list A) {list A) ^ a and {list A) {list A) {list A) a, 

An important point concerning this test case is that lists were represented using 
user defined constructors called pnil and peons rather than the system defined list 
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constructors nil and ;;. The actual testing consisted of invoking rev 30,000 times on 
a collection of lists. 

Mono Linecir Rev This program implements tail recursive reverse on monomor- 
phic lists. Lists are represented the same way as in Mono Naive Rev. The predicate 
rev is implemented by the following code. 

type rev {list i) — > {list i) -^o. 

rev LI L2 :— rev.aux LI mnil L2. 

type rev.aux {list i) —>■ {list i) {list i) — >o. 

rev^aux mnil L2 L3. 

revMux {mcons X LI) L2 L3 : — 

rev.aux LI {mcons X L2) L3. 

Testing in this case consisted of running rev 100,000 times on a 10 element list. 

Poly Linear Rev This program implements tail recursive reverse on polymorphic 
lists. The predicates rev and rev^aux have the polymorphic types 

{list A) {list A) ^ and {list A) {list A) —>■ {list A) o, 

and similar definitions to those in Mono Linear Rev. As in Poly Naive Rev, lists 
are represented in this example via user defined constructors. Testing consisted of 
running rev 100,000 times on a 10 element hst. 

Poly Naive Rev* This test case was like Poly Naive Rev except this time the 
builtin representation of lists was used. 

Poly Lineeir Rev* This test case was like Poly Linear Rev except this time the 
builtin representation of lists was used. 

Red Black Tree This program implements a polymorphic version of red-black 
trees. A kind btreety of arity one is defined to categorize the family of the trees. A 
type color with the two constants red and black is also defined. The leafs and nodes 
in a tree are encoded by constants empty and node of types 

btreety A and 

color ^ A ^ {btreety A) {btreety A) {btreety A). 
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Teyjus version 1 


Teyjus version 2 


Degradation 


Mono Naive Rev 


1.51 sees 


2.27 sees 


50.3% 


Poly Naive Rev 


1.81 sees 


2.80 sees 


54.7% 


Mono Linear Rev 


1.18 sees 


1.81 sees 


53.4% 


Poly Linear Rev 


1.47 sees 


2.24 sees 


52.3% 


Red Black Tree 


2.7 sees 


4.14 sees 


53.3% 


First- order copy 


1.11 sees 


1.73 sees 


55.9% 


Poly Naive Rev* 


1.30 sees 


1.65 sees 


26.9% 


Poly Linear Rev* 


1.05 sees 


1.31 sees 


24.8% 



Table 9.1: Timing eomparisons on first-order programs. 



The arguments provided to node represent the eolor, the left subtree and the right 
subtree. Predieates add and memh are defined to implement the insertion and seareh 
operations respeetively. Their types are deelared as 

A — >• {btreety A) — >• {btreety A) ^ o and 
A {btreety A) o. 

The arguments of add correspond to the value to be inserted, the original tree and the 
tree after insertion, respeetively. The predicate memb takes as its arguments a value 
and a tree that is to be searched for this value. The testing consisted of creating a 
tree of 1500 integer values and then searching for each of the values in the tree. 

First-order Copy In this test, the program in Figure 2.1 for copying A-terms was 
used. However, the invocation of copy were all restricted to first-order structures, i.e., 
those constructed from only the constants a and app. Testing in this case consisted of 
repeating 100,000 times the solution of the query {copy t R), where i is a first-order 
term of depth 4. 

Table 9.1 presents the results of running the test cases described with 
Teyjus Version 1 (v 1.0-b32) and 

Teyjus Version 2 (v 2.0-b2) without type optimizations 

on a 2.6GHz 32-bit 1686 processor. The numbers in the middle two columns of the 
table represent the CPU time taken by the execution of the programs. The last 
column of numbers denote the performance difference between the two versions of 
systems, which are calculated by the following formula. 

execution time in Teyjus Version 2 — execution time in Teyjus Version 1 
execution time in Teyjus Version 1 
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The first six rows of tlie table indicate a fairly consistent degradation arising out 
of tlie low-level representation used for terms in the newer Teyjus system: averaged 
across these examples, the degradation is about 53.3%. The degradation is substan- 
tially less for the last two cases. This result actually accords with expectations. The 
builtin constructors and nil are treated in a special way in our implementation 
model. This treatment builds in the type optimizations for these constructors in a 
way that is infeasible to turn off. Thus, in these cases the actual degradation due to 
the unoptimized low-level representation of terms is partially offset by improvements 
in the way types are handled. In interpreting the results of this section, therefore, we 
shall disregard the data from the last two rows in Figure 9.1. 

9.2 Impact of Type Optimizations 

As discussed in Chapter 7, there arc two ways in which the type associations that 
persist into execution are reduced in Teyjus Version 2. First, the list of types associ- 
ated with each constant occurring in terms is reduced by eliminating instantiations 
for variables that appear in the target type of the constant. Second, an analysis is 
carried out over clause definitions to identify those variables in the type of the pred- 
icates they define that have no effect on runtime computations; it is redundant to 
carry along bindings for these variables and hence these are eliminated. 

A measurement of the impact of the two different levels of types-related opti- 
mizations was conducted by turning on and off the procedures in the compiler that 
effect the optimizations. One set of programs over which testing might then be done 
consists of those that are genuinely polymorpliic in nature. The test cases Poly Naive 
Rev, Poly Linear Rev and Red Black Tree introduced in the previous section can be 
used as examples of this class. Another set of programs that would be useful to 
test would be higher-order ones that represent typical applications of XProlog. The 
following programs were included as representative of this class. 

Typeinf This program infers principal type schemes for ML-like programs [30]. 
Inside it, the representation of the object-level types treats quantification explicitly 
and utilizes abstractions to capture the binding effect. A type inference algorithm 
similar to that in [13] was used, and the computation is specified in the L;^-style. 

Hcinterp This program implements an interpreter for a language based on first- 
order Horn clauses [44]. The declarations in Figure 2.2 describe a signature for 
representing such formulas. A predicate interp of type form — > form — > o is defined for 
determining whether a given goal formula is derivable from a conjunction of definite 
clauses. This program needs higher-order features because object-level quantification 
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Teyjus 


version 2 (v 


2.0-b2) 






none 


top-level 


top-level and clauses 


Poly Naive Rev 


2.80 sees 


2.30 sees 


17.9% 


2.27 sees 


18.9% 


Poly Linear Rev 


2.24 sees 


1.84 sees 


17.9% 


1.81 sees 


19.2% 


Red Black Tree 


4.14 sees 


3.80 sees 


8.2% 


3.78 sees 


8.7% 


Typeinf 


1.27 sees 


1.20 sees 


5.5% 


1.20 sees 


5.5% 


Hcinterp 


2.38 sees 


2.14 sees 


10.1% 


2.14 sees 


10.1% 



Table 9.2: Timing eomparison on type optimizations. 



is eneoded within it through abstraetions. An interesting aspeet of this program in 
that it does not statieally fit within the higher-order pattern fragment. However, the 
standard usage of this program ensures that it is dynamieally in this fragment, i.e., 
it is only ever neeessary to solve higher-order pattern unifieation problems during 
eomputation. 

Polymorphic lists are used in the two higher-order programs. To focus attention 
on the benefits that might be obtained from the type optimizations, we have replaced 
the use of the system defined constructors for representing these hsts with the user 
defined constructors peons and pnil introduced in the previous section. 

The results of our experiments are present in Table 9.2. The columns with tags 
none, top-level and top-level and clauses denote the type optimization levels as no 
type reduction, top-level constant type reduction only, and reductions for both top- 
level constants and predicate definitions respectively. The numbers of seconds in the 
table correspond to the execution time of programs obtained with different levels of 
type optimizations. The data for Poly Naive Rev and Poly Linear Rev are collected 
from 100,000 invocations of rev on a 10 element fist of type {list i). In the ease of 
Red Black Tree, the times that are measured are for creating a tree with 1,500 integer 
elements and searching for each element subsequently. The numbers in the 4th and 
6th columns indicate the percentage improvement resulting from the different levels 
of type optimizations against a base that does not use any of the optimizations. Prom 
the presented data, it can be observed that type optimizations, especially that for 
top-level constants, have a noticeable impact on first-order polymorphic programs. 
The improvements in the case of the higher-order programs is not so marked. This ob- 
servation also accords with intuitions. Many XProlog programs that use higher-order 
features typically do so over monomorphic representations of objects, using polymor- 
phism only in utility predicates and data structures such as those implementing lists. 
Type optimizations provide benefits only in those situations where there is genuine 
use of polymorphism. 
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9.3 Impact of Higher-Order Pattern Unification 

We now turn to measuring the effect of orienting the processing model around higher- 
order pattern unification rather than using Huet's general procedure. The testing in 
this context consists of comparing the execution times of Teyjus Version 1 and Teyjus 
Version ^ on a collection of typical XProlog programs. The specific programs in our 
suite consisted of Typeinf and Hcinterp described in the previous section and the 
following additional ones. 

Prenex This program implements a transformation from arbitrary formulas in a 
first-order logic into ones that are in prenex normal form. Abstractions in A-terms 
are used to capture the binding aspects of first-order quantifiers. The essential part 
of the program is presented in Figure 2.5. 

Compiler This program implements a compiler for a small imperative language 
with object-oriented features [32], including a bottom up parser, a continuation 
passing-style intermediate language, and generation of native byte code. 

Hcsyntcix Relative to the signature specified in Figure 2.3, this program defines 
the predicates goal and def. clause of type form — > o that serve to recognize formulas 
whose syntax adhere to that of goal formulas and definite clauses in the setting of 
first-order Horn clauses. 

Tailrec This program describes the encoding of a simple functional programming 
language and implements a recognizer of tail recursive functions of arbitrary arity [44] . 
The concept of scope embodied in the object level language is explicitly encoded by 
abstractions, and augment and generic goals are used to realize recursion over such 
structure. 

All the programs in this test suite except for Hcinterp can be viewed as represen- 
tatives of the L;^-style programming. With regard to the usage of types, the following 
observations can be made. The examples Prenex, Hcsyntax and Tailrec only use 
monomorphic types. Polymorphism is present in Typeinf, Compiler and Hcinterp, 
but as remarked in the previous section, such usage is only relevant to the encoding 
of lists as auxiliary data structures and is incidental to the essential computation 
carried out by these programs. In this set of tests, we have reverted to the use of 
built-in representations of lists rather than using user defined constructors. 

The results of this set of experiments are present in Table 9.3. The numbers of 
seconds appearing in the 2nd and 3rd columns are the actual times taken by the 
execution of the programs on the two versions of systems respectively. The numbers 
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Teyjus version 1 


Teyjus 


version 2 


Improvement 


Prenex 


3.71 sees 


1.77 sees 


1.157 sees 


68.8% 


Typeinf 


2.53 sees 


1.16 sees 


0.758 sees 


70.0% 


Compiler 


2.05 sees 


2.71 sees 


1.771 sees 


13.8% 


Hcinterp 


1.58 sees 


2.14 sees 


1.399 sees 


11.5% 


Hcsyntax 


1.11 sees 


1.75 sees 


1.144 sees 


3.0% 


Tailrec 


1.90 sees 


2.78 sees 


1.817 sees 


4.3% 



Table 9.3: Timing eomparisons on Lx programs. 



appearing in the 4th column are a "normalized" execution time on Teyjus Version 
2 obtained by correcting for the hypothesized degradation arising from our choice 
of low-level term representation; the normalization amounts to dividing the actual 
execution time on Teyjus version 2 by the factor (1 + 53.3%). The percentages in the 
last column of the table corresponds to the improvement brought about by the new 
system after the term encoding noise is factored out. The calculation is carried out 
by the following formula. 

normalized execution time in Teyjus v2 — execution time in Teyjus vl 

execution time in Teyjus vl 

Performance improvements of varying degrees in the different test cases can be 
seen to result from using Teyjus Version 2 . The execution time is substantially 
reduced in the case of the first two programs. These programs use higher-order 
pattern unification significantly and polymorphic typing is not used in the first and 
only sparingly in the second. Thus the better performance is attributable in these 
cases mostly to the higher-order pattern unification employed in the interpretive 
unification process of the emulator. In the Compiler example, a significant part of 
the computation is not higher-order although there are also parts that use A-tcrms and 
unification in a non-trivial way. Based on the earlier studies, we anticipate that type 
optimizations contribute to about 5%-6% with the rest of the improvement coming 
from the changed treatment of higher-order unification. The Hcinterp program uses A- 
terms and the syntax here does not even adhere to the higher-order pattern restriction. 
However, by the time unification is considered in this case, most of the terms have, in 
fact, become first-order in nature. Following the discussion in the previous section, it 
can also be noticed that the improvement in this case is almost entirely attributable 
to the type optimizations. There is virtually no change in the performance observed 
over the last two programs. This is also understandable. These programs embody 
only an analysis of the objects they work over — first-order formulas and functional 
programs in the respective cases. The L;^ style of programming results in the use of 
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Number of abstractions 


Teyjus version 1 


Teyjus 


version 2 


Improvement 


1 


0.06 sees 


0.09 sees 


0.059 sees 


1.7% 


5 


0.44 sees 


0.44 sees 


0.287 sees 


34.6% 


10 


0.99 sees 


0.87 sees 


0.569 sees 


42.6% 


15 


1.77 sees 


1.45 sees 


0.945 sees 


46.5% 



Table 9.4: Effeet of searehing in pattern unifieation problems. 



only first-order unifieation in sueh analysis, higher-order pattern unifieation playing 
a role only when a synthesis of new strueture is also involved. 

A question that is interesting to analyze is what partieular eharaeteristies of uni- 
fieation problems in the higher-order pattern fragment might eause a behavior dif- 
ferenee between Huet's proeedure and a more targetted unifieation algorithm. Our 
hypothesis, based on looking at the kinds of disagreement pairs that aetually par- 
tieipate in the interpretive unifieation proeess during the exeeution of Prenex and 
Typeinf, is that a signifieant eontributor to this differenee is the presenee during 
unifieation of disagreement pairs of the form 

(q, {H ci ... c„)), 

where if is a logie variable, Ci, c„ are distinet eonstants with higher universe index 
than H and i is some number between 1 and n. Given sueh a pair, Huet's unifieation 
proeedure attempts to solve it by somewhat blindly eonsidering bindings for H of the 
form A(n, for all j sueh that 1 < j < n. This gives rise to a (admittedly shallow) 
branehing whose width in a depth-first seareh setting is eontroUed by the partieular 
value of i, assuming that we stop the seareh at the first point of sueeess. On the 
other side, higher-order pattern unifieation treats sueh pairs differently, generating 
the right substitution deterministieally by immediately trying to mateh q to one of 
the eonstants in Ci, . . . , c„. 

To try and validate our hypothesis, we eondueted an experiment using the copy 
example. The queries we used in this eontext were of the form copy t Result, where 
i is a term with the strueture 

abs Xi\ ... abs Xn\ (app Xi (app Xi (app xi (app Xi (app Xi Xi))))). 

By setting the arguments of app to the disagreement pairs that are generated take 
the form (c„, (if Ci ... c„)). The way substitutions are eonsidered in Teyjus Version 
i, {n — 1) bindings are attempted for H before the "eorreet" one for sueh a pair is 
aetually found. 

Table 9.4 presents the results obtained these experiments. Exeeution times shown 
in this table result from 5,000 invoeations of the given queries on the two systems. 
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Number of abstractions 


Teyjus version 1 


Teyjus 


version 2 


Improvement 


1 


0.06 sees 


0.09 sees 


0.059 sees 


1.7% 


5 


0.38 sees 


0.50 sees 


0.327 sees 


13.9% 


10 


0.72 sees 


0.95 sees 


0.621 sees 


13.7% 


15 


1.17 sees 


1.56 sees 


1.020 sees 


12.9% 



Table 9.5: Narrowing the effeet of seareh in pattern unifieation. 



The numbers in the 4th eolumn are the normahzed exeeution times on Teyjus Version 
2. The last eolumn denotes the performanee differenee obtained from viewing the 
execution time on Teyjus Version 1 as the basis of comparison. An improvement 
that is linear to the number of abstractions can be observed in this case. 

The differences observed above could, of course, be the result of other factors 
that we might have somehow overlooked in our analysis. To try and eliminate this 
possibility, we conducted another set of experiments, ones in which the pairs generated 
were such that the very first substitution considered for H in the Teyjus Version 1 
setting would be the right choice. Specifically, we once again tried queries of the form 
copy t Result, but this time where t had the structure 

abs xi\ ... abs Xn\ (app xi (app xi (app xi (app xi (app xi xi))))). 

By always using the bound variable xi as the arguments of app., the disagreement 
pairs generated are of the form (ci, {H ci ... c„)). The first substitution generated 
for H in Teyjus Version 1 succeeds for such pairs. Wc would therefore expect much 
smaller differences with such queries. Table 9.5 presents the results obtained from the 
new experiment; execution time is measured again for 5,000 invocations of the given 
queries with the two versions of systems and the different columns have the same ex- 
planations as before. The figures in this table show much smaller differences, thereby 
conforming with our expectations. Combined with the earlier results, our hypothesis 
that a specific branching behavior contributes significantly to the differences between 
the two versions of the Teyjus system appears confirmed. 

Before concluding this section, it is useful to understand that while the observed 
responses of the two versions of the Teyjus system agree on most practical programs 
and queries, they also sometimes differ. When restricted to the L;^ fragment of XProlog 
it is sometimes possible that Teyjus Version 1 will produce an answer conditioned on 
the solutions to a remaining collection of fiexible-fiexible disagreement pairs (that are 
known to have at least one solution) , whereas Teyjus Version 2 will solve these pairs 
completely. In the other direction, there are examples of programs outside the 
fragment on which Teyjus Version 1 will provide complete answers whereas Teyjus 
Version 2 will stop at a point short of this. As an example of this latter kind, consider 
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the following program defining the predicate mapfun of type {list i) ^ (i ^ i) — > 
{list i) ^ for some sort i: 

mapfun nil F nil. 

mapfun {X :: LI) F {{F X) :: L2) : - mapfun LI F L2. 

Intuitively, the predicate mapfun maps the elements in the first list argument to those 
in the third by applying the function given by the second argument. Let g and a be 
constants of types i ^ i and i respectively. The disagreement pair {{Fa), {g a)) that 
is generated in solving the query 

?- mapfun {a :: nil) F {{g a) :: nil ) 

escapes the L\ subset and hence is not solved in Teyjus Version 2; instead it is 
simply produced as a remaining pair at the end of the computation. However, this 
disagreement pair can be successfully solved by Huet's procedure, and so, when the 
same query is provided to Teyjus Version 1, it will succeed with the two answer 
substitutions {F, Xx g x) and {F, Xx g a). 

9.4 A Summary of the Assessments 

We conclude this chapter by summarizing and consolidating the various observations 
contained in it concerning our design ideas and the specific realization of these in 
Version 2 of the Teyjus system. 

One major characteristic of the new version of the Teyjus system is its choice of 
low-level encoding of terms. The way we have chosen to do this has meant a degra- 
dation in speed of about 50%. While we have not measured this explicitly, it is likely 
that space usage is also impacted by this choice: hand-coded term representations 
are bound to be significantly more compact than ones generated by the C compiler 
based on structure declarations. One counter to these drawbacks is that by letting 
the real code be free of low-level decisions and hacking tricks, we have made it much 
more transparent, modular and error-free. A further point to note is that special 
low-level treatments can still be built in once an architecture has been selected by 
changing a particular module that deals with this issue in our implementation. A 
final point to note is that the way we have dealt with this issue leads naturally to 
an extremely portable system. We note in this context that such portability can 
also have an important impact on the "speed of execution" by allowing us to use 
newer and faster architectures to run XProlog programs. As a specific example, recall 
that Teyjus Version 2, unlike Teyjus Version 1, can be built on 64 bit machines as 
well and not just on 32 bit ones. Table 9.6 presents some data that is relevant in 
this context. In particular, it shows the execution times for a set of queries made 
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2.6GHZ 32-bit i686 


2.6GHZ 64-bits x86 


Prenex 


2.71 sees 


1.25 sees 


Typeinf 


1.16 sees 


0.78 sees 


Compiler 


2.71 sees 


1.66 sees 



Table 9.6: Comparing Teyjus version 2 on different areliitectures. 



against tlie Prenex, Typeinf and Compiler programs wlien running Teyjus Version 2 
on a 2.6GHZ 32-bit i686 and a 2.6GHZ 64-bits x86 proeessor. Tfie performanee is 
notieeably better on tlie 64 bit arcfiitecture. 

Tlic second kind of conclusion concerns the benefit of using higher-order pattern 
unification. There are improvements from this that take two forms. First, this algo- 
rithm allows an efficient runtime time type processing scheme that results in 5% to 
18% spccdups in the execution times for a collection of first-order and practical Lx 
programs that we tested. A further observation is that the two kinds of type opti- 
mizations utilized in our compiler do not contribute evenly to the overall performance 
improvements. In fact, most of the acceleration results from the reduction in type 
annotations maintained with constants; the improvements from reductions in type 
associations with predicate definitions are minor, especially for practically relevant 
XProlog applications. The second kind of advantage resulting from using higher- 
order pattern unification concerns the reduction in search. The improvement from 
this is large especially for XProlog programs used in the intended meta-programming 
tasks. At a more detailed level, our analysis has also exposed the causes for such an 
improvement in the treatment of search. 

In addition to the impact on performance, orienting the implementation around a 
treatment of only higher-order pattern unification has the effect of considerably sim- 
plifying the structure of the system. Although not directly quantifiable, the benefits 
from this have been enormous. The instruction set for our abstract machine, espe- 
cially the part included for treating types, is much simplified. The uniform nature of 
these instructions now makes it possible to consider compiling other languages similar 
to XProlog to them. The choice with regard to unification also eliminates branching in 
its treatment, thereby also enormously simplifying the abstract machine. The impact 
of this aspect should not be underestimated. The need to deal with a more complex 
unification procedure in an efficient fashion has made the code for Teyjus Version 
1 extremely complicated and, hence, error-prone and inscrutable. By contrast, we 
believe that even the realization of the abstract machine in Teyjus Version 2 is quite 
penetrable and easy to maintain and modify. 



Chapter 10 



Conclusion 

In this thesis, we have considered an abstract machine and compilation based reahza- 
tion of the XProlog language that is oriented around higher-order pattern unification. 
We have not limited the syntax of the language in order to use this restricted form of 
unification. Rather, our approach has been to use the restriction dynamically: while 
being prepared for arbitrary unification problems, an implementation based on our 
ideas will solve completely only problems in the higher-order pattern class, leaving 
any other problems as constraints that are either to be solved later if subsequent 
substitutions put them into the restricted class or to be reported to the user as qual- 
ifications on answer substitution. This approach is obviously theoretically limited 
in comparison with one that uses Huct's procedure for the full class of unification 
problems in that it could result in uninformative answers being provided to the user 
in certain cases; wc observed an example of this kind in Section 9.3. However, our 
approach is practically well-motivated: an empirical study of a large collection of real 
programs in a AProlog-like setting has shown that virtually all unification problems 
that are encountered during computation arc either in the higher-order pattern or in 
the even simpler first-order class [33]. Within this context, the unification algorithm 
that we use is capable of solving flexible-flexible disagreement pairs and hence has the 
advantage sometimes of providing more complete answers. From an implementation 
perspective, using the restricted algorithm has the benefits of simplifying the process- 
ing model by eliminating branching in search and greatly reducing the runtime role 
of types. 

At a concrete level, this thesis has developed an actual abstract machine and 
compilation techniques to complement the processing model described above. The 
structure that we have designed has several novel components. First, it uses a rep- 
resentation of A-terms based on an explicit substitution calculus and it includes a 
reduction procedure for these terms that is optimized to the particular context of 
a higher-order logic programming language. Second, it seamlessly integrates an in- 
terpretive treatment of higher-order unification problems with a compilation based 
treatment of first-order unification that is driven by the terms that appear in the 
heads of clauses. Finally, it incorporates static analysis techniques to reduce even 
further the runtime presence of types. 

This thesis has also provided an actual implementation of XProlog based on the 
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design that it has proposed. This system, called Teyjus Version 2 also has several 
interesting ideas. The two major requirements that have driven its development are 
portability and an openness in structure that can be exploited in extending its ca- 
pabilities and in experimenting with different low-level design choices. These foci 
have led to implementation challenges that have also been addressed. To free the im- 
plementation from architecture specific decisions, we have pushed layout choices for 
terms to the C compiler, making use of a broad understanding of such compilers to 
obtain a tradeoff between efficiency and generality. To make the code structure pen- 
etrable, we have used a genuinely high-level language — Ocaml in this case — wherever 
possible in the implementation. Since it is also imperative to use a low-level language 
(typically C) for efficiency reasons in certain parts of the system, we have had to deal 
with the issue of interoperability between implementation languages across a broad 
interface. An especially interesting aspect of the code that we have developed is the 
manner in which we have been able to realize the sharing of information about in- 
struction and general machine structure between the two languages without tedious 
and error-prone replication in the two settings. 

A final contribution of this thesis has been the evaluation of our design ideas and 
a general understanding of the costly aspects of higher-order unification. This part of 
our work has consisted of instrumenting the new implementation and an earlier one 
that utilizes Huet's original procedure for higher-order unification and of using these 
two systems in a series of experiments over a relevant collection of XProlog programs. 

There have been four previous implementations of XProlog in addition to Ver- 
sion 1 of the Teyjus system that is discussed in this thesis. Three of these have 
been interpreter based and have used a high-level language exclusively in the realiza- 
tion: specifically, in Prolog [35], Lisp [14] and SML [15, 64]. None of these systems 
considered in any detail the special issues that arise in a low-level treatment of the 
higher-order aspects of XProlog. The compilation based implementations have been 
the Teyjus Version 1 discussed here and Prolog/Mali [7]. The Prolog/Mali system 
achieves compilation indirectly by first translating XProlog programs into C code 
and then compiling the resulting C code. The translation process utilizes a memory 
management system called Mali that has been developed especially for logic pro- 
gramming languages: in particular, translation is realized in the form of calls to 
functions supported by this system. A more detailed comparison of the treatment of 
the higher-order aspects to XProlog between the Prolog/Mali system and those in the 
Teyjus family can be found in [40] . 

The work in this thesis can be extended in several ways. One interesting direction 
to pursue is that of incorporating a treatment of particular cases of higher-order 
pattern unification into the compilation structure rather than pushing this off entirely 
to the interpretive phase. An example of where such compilation might be useful is 
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a situation that we discussed when analyzing the test programs Typeinf and Prenex 
in Section 9.3. Here we observed that a common form for disagreement pairs is 

{t, {H ci ... c„)), 

where t is a first-order term, and {H ci ... c„) is a term in which if is a logic variable 
and Ci, c„ are distinct constants with higher universe indexes than that of H. The 
term t is often obtained in these cases from one of the arguments of the clause head. 
Compilation can therefore utilize the structure of t that is statically available. For 
example, the instruction 

get.structure A^, J , n, 

can be enhanced so that when the dereferenced result of the term given by Ai is ac- 
tually a flexible higher-order pattern term, the execution of the following instructions 
can be carried out in a "END" mode and geared towards realizing the relevant parts 
of the computation described in Figure 4.4. It can be observed from the transfor- 
mation rules of bnd that the argument list [ci, ...,c„] then has to be carried across 
the instructions following the current getstructure. To take a concrete example, sup- 
pose t is of form (/ X), where / is a constant and X is a subsequent occurrence 
of a variable universally quantified at the clause head. In the immediately following 
instruction unify. value corresponding to X, the list [ci, ...,c„] has to be input to an 
interpretive hnd process. 

This kind of passing on of the argument list of the dynamic term to later in- 
structions is not one that is necessary in a first-order setting and hence has not been 
considered in WAM-style compilation models. Two sorts of attempts were made dur- 
ing the design of our abstract machine for realizing this requirement, but neither of 
them led to a solution that we considered satisfactory. The unsuccessful attempts are 
nevertheless discussed below for the purpose of illustrating the problems that were 
identified. 

The first way of solving the problem that we considered is to set one of the data 
registers Ai to refer to an argument vector when necessary and to use this register 
as an explicit argument to the subsequent instructions. Taking the example {f X), 
then we can have instructions as the following: 

get-structure Ai, f, 1, A255 
unify. value A2, ^255 

where A255 is the register holding the argument list. However, this solution has a 
problem in that it adds more work to instructions that are also used for simple first- 
order unification. This form of unification is assumed to occur much more frequently 
and hence this approach could adversely affect the overall execution time. 
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The second method we have attempted is to use a special register, for example, 
the register Arg Vector, to refer to the argument vector. This register can then be set 
in the execution of get^structure, to be checked by the following instructions when 
necessary. However, a closer examination on this solution reveals that it actually 
requires the term t from the clause head to be processed in a depth-first manner, 
whereas the processing order of head unifications underlying WAM instructions is in 
fact breath-first. This can be illustrated by the following example. Suppose the head 
of the clause that is to be compiled is of form 

foo... (f (f X)) (g (g Y)), 

where / and g are top-level constants, and X and Y are second or later occurrences of 
variables universally quantified in the front of the clause. The instructions generated 
in our implementation take the following structure: 



foo: 



LI: 


get_structure 


A„ 


f 


1 




unify. variable 








L2: 


get-structure 


A2, 


9, 


1 




unify ^variable 


^4 






L3: 


get_structure 


A,, 


f, 


1 




unify_value 


X 






L4: 


get.structure 


Aa, 


9, 


1 




unify-value 


Y 







Now suppose the goal to be solved actually takes the form 

foo ... (G cl ... cn) F, 

and further, assume the instruction get-structure is enhanced to deal with higher- 
order patterns in a way described above. Then the execution of this instruction at 
label LI sets the register ArgVector to refer to the argument list [ci, c^], which is 
assumed to be used by the get-structure and unify-value instructions following label 
L3. However, it can be observed that the execution of get.structure at label L2 
overwrites ArgVector to an empty list. 

A way to overcome this problem is to add a segment of instructions that are only 
executed in the "END" mode. For example, when the argument (f (f X)) of foo is 
considered in isolation, we can have the following instructions generated. 

LI: get-structure Ai, f 1, L5 

unify -Variable A2 
L2: get.structure A2, f 1, L6 
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unify.value X 

goto END 

L5: unify. variable A2 

bnd A2, f, 1 

unify.value X 

goto END 

L6: unify.value X 

goto END 



END: 



In the code above, assume that the additional label argument to get_structure corre- 
sponds to the start of the instruction sequence that must be executed in the situation 
when the dynamic term is of the flexible higher-order pattern form discussed. Fur- 
ther, assume bnd and goto are two new instructions. The former carries out the 
corresponding binding actions in the rigid-flexible case specified in Figure 4.4, and 
the latter is a simple jump to the given address. 

The problem with this method is obvious: viewing the entire clause head as an 
application, the size of the instructions is exploded exponentially to the total number 
of applications contained within it. The compilation result is not satisfactory even for 
our original foo example. For example, here we would get the rather long sequence 
shown below: 



foo: 



LI: 


get-structure 




f 


1, 


L5 




unify ^variable 


^3 








L2: 


get.structure 


A2, 


9, 


1, 


L6 




unify .variable 


A, 








L3: 


get.structure 


A„ 


f 


1, 


L7 




unify-value 


X 








L4: 


get-structure 


Ai, 


9, 


1, 


L8 




unify-value 


Y 










goto 


END 









L5: unify -Variable A3 

bnd A^, f 1 

unify-value X 
get-structure A2, g, 1, L6 
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unify .variable A4 
get-structure A3, f, 1, LI 
unify^value X 
get.structure A4^, g, 1, L8 
unify.value Y 
goto END 

L6: 

In the future research, the feasibihties of the methods proposed above can be 
further explored. A closer study can be conducted of the actual impact of each of 
them on actual XProlog programs. Practical adjustments are also possible based on 
an empirical assessment. For instance, the second method can be controlled in a way 
such that it is only performed on the top-level structures of the arguments of the 
clause head. 

Another possible extension to the work in this thesis is the reduction of the so- 
called occurs-check in unification. In first-order unification, this check corresponds to 
examining the structure of the term t to ensure it does not contain occurrences of the 
logic variable X at the time when an attempt is made to bind X to t. This check is 
generalized in the context of the higher-order pattern unification. It can be observed 
from Figure 4.3 and Figure 4.4 that occurs-check is needed in unifying a pair {X, t) 
for the following three reasons. 

1. The logic variable X could occur in t, where non-unifiability should be detected. 

2. The term t could contain a rigid sub-term with its head being a constant c such 
that c resides in a universe higher than that of X, which leads to non-unifiability. 

3. The term t could contain a flexible sub-term {Y c\ ... c„), such that Y resides 
in a universe that is lower than X, and the universe levels of some constants Cj 
in its argument list are higher than that of X. In this situation, the (implicit) 
raising of X introduces a list of arguments which could be pruned against the 
arguments of Y . 

The performance of occurs-check is generally viewed as expensive to execution, 
since otherwise, the solution of the pair (X, t) can be realized as simply binding X to 
t without any traversal over the structure of t. In the conventional implementations 
of Prolog and the Prolog /Mali implementation of XProlog, the occurs-check is left 
out entirely. In the Teyjus family of implementations, the occurs-check is performed 
in the following way. A register VAR (TY^VAR for the first-order occurs-check on 
types) is used to record the variable (type variable) for which a binding is being 
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calculated, and is checked against the structures of the term (type) that constitute 
the other element of the disagreement pair during the interpretive unification pro- 
cess. These registers are also set in the executions of the instructions geLstructure 
(get.typed.structure) and get.type.structure, when the incoming term or type is a vari- 
able or a type variable, in which case the computation starts to construct a first-order 
application (type structure) as the binding for it, to communicate the (type) vari- 
able whose occurrence should be checked in the interpretive unification invoked by 
the following unify^value or unify Jype^value corresponding to the arguments of the 
enclosing first-order application or type structure. 

Optimizations that are targeted towards avoiding unnecessary occurs-check could 
be significant to the performance of the implementations of our language. In fact, 
one such optimization is already present in our compilation model. This optimization 
happens in the compilation of the pair {X, t) , where X is a the first occurrence of 
a variable that is universally quantified at the clause head. In this situation, it 
can be observed that none of the three cases requiring occurs-check described above 
can actually happen. In particular, a new logic variable, say X', with the cm^rcnt 
universe level is introduced to replace X when the clause definition is selected to 
solve an incoming goal. Since X is in its first occurrence, it is impossible for X' 
to be contained by any other terms. Next, the universe index of X' is already the 
largest one in the current computation context, so that the possibility for the second 
situation to occur is also eliminated. Finally, the rasing of X' against any flexible 
L^-subterm {Y bi ... bn) contained by t results in an argument list for X' in which 
all the constants in [61, 6n] are contained, since X' has the highest universe index, 
and consequently nothing can be pruned in this argument list against [61, &„]. For 
these reasons, X' can be immediately bound to t. Such a special treatment of bindings 
without occurs-check is in fact captured by the instructions 

get-variable Ai, Aj and unify ^variable Ai, 

the execution of which simply copy the content of Aj [S for the latter) into the register 
Ai. A similar optimization also exists for compiled type unification through the usage 
of 

get-type-variable Ai, Aj and unify Jype-variable Ai. 

Research in [59] and [60] proposes an optimization, called linearization, for mini- 
mizing occurs-check similar to that in our compilation model in handling higher-order 
pattern unification within a dependently typed A-calculus [58]. When adopted into 
our context, this approach suggests a pre-processing in compilation to translate the 
clause definitions into a form that any subsequent variable occurrence in a clause 
head is replaced by a new variable in its first use, with additional unifications over 
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the new variable with the one by which it is replaced inserted into the beginning of 
the clause body. For instance, suppose a clause under consideration is of form 

foo X (f X) t :- <goal>, 

where t is some arbitrary argument. The linearization result becomes a clause 

foo X (fZ)t : - X ^ Z, <goal>. 

Within the computation context considered by [59] and [60], where no compilation 
on unification is considered, this approach has significant effect since after the lin- 
earization, the bindings from a variable to a term required in the matching of a clause 
head can be simply performed without occurs-check during their interpretive compu- 
tation. However, in our context, this approach actually has almost the same effect as 
our special treatment on the first-occurrence of variables described above except that 
computations requiring occurs-check is further delayed till the end of the processing 
of the clause head. The usefulness of this delay is arguable. Considering the example 
above, suppose the argument t in the clause is a constant c and further the query has 
the form 

?- foo W (f (g W)) d. 

where d is a constant different from c. The delay of the unification over {W, {g W)) 
is beneficial here since failure will be simply recognized from the inequality between 
c and d. However, in another case, suppose the third argument of the clause and the 
query are of the form (/ (/ (/ {g c)))) and (/ (/ (/ {g d)))) respectively, where the 
non-matching constants c and d are embedded deeply inside, the eager calculation 
over {W, {g W)) becomes more efficient than actually carrying out the unification on 

(/ (/ (/ {g c)))) and (/ (/ (/ {g d)))). 

A more useful solution to this problem that can be considered is to build a mecha- 
nism to dynamically detect the absence of the three situations requiring occurs-check 
described before, and perform the simple binding when it is the case. For example, 
compound terms can be attributed with the maximum universe index of the constants 
contained inside, and an additional attribute can be associated with logic variables to 
indicate whether they are in their first occurrence. Such attributes should be main- 
tained by the unification and normalization processes for them to have any practical 
value. A specific approach of this sort is to be investigated. 

In addition to improving our abstract machine and processing structure, enhance- 
ments can also be made to the system that has been implemented. For example, 
compilation treatment can be considered for handling the top-level queries in our 
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system. In the absence of such compilation, queries are restricted to not containing 
augment goals. A compiled treatment would allow us to lift this restriction. Second, 
the exphcit treatment on the disjunctive goals by the abstract machine discussed 
in Section 6.5 could also be beneficial to the performance of the system. Finally, a 
garbage collector for the emulator is also an important enhancement to our system. 
The construction of such a garbage collector is, in fact, currently under investigation. 

Many of the implementation ideas developed in this thesis seem not to be limited 
to XProlog and should be of use within the broader framework of implementing higher- 
order features in logic programming and reasoning systems. Specifically, these ideas 
may be applicable in the context of logic programming within a dependently typed A- 
calculus [58], and of mcta-thcory based reasoning about computational systems [3, 17]. 
These kinds of systems seem to be of growing importance within the specification 
and verification realm. It would be of interest, therefore, to investigate the actual 
applications of our ideas in these more general settings. 
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